# LLM Sidecar — Deployment Guide ## Quick Install On the Main PC: ```bash # 1. Copy the service file sudo cp deploy/llm-sidecar.service /etc/systemd/system/ # 2. Create the working directory and copy files mkdir -p /home/bigt/AI/llm cp deploy/manifest.yaml /home/bigt/AI/llm/manifest.yaml # Copy the sidecar Python package (app.py + manifest.py) cp -r sidecar/ /home/bigt/AI/llm/sidecar/ # Copy requirements.txt for the venv cp requirements.txt /home/bigt/AI/llm/ # 3. Create a Python virtual environment with dependencies python3 -m venv /home/bigt/AI/llm/venv /home/bigt/AI/llm/venv/bin/pip install -r /home/bigt/AI/llm/requirements.txt # 4. Create a .env for the sidecar (optional) cat > /home/bigt/AI/llm/.env << 'EOF' # Sidecar configuration MANIFEST_PATH=/home/bigt/AI/llm/manifest.yaml SIDECAR_PORT=8080 EOF # 5. Enable and start the service sudo systemctl daemon-reload sudo systemctl enable --now llm-sidecar # 6. Verify it's running sudo systemctl status llm-sidecar ``` ## Verify ```bash # Check sidecar is responding curl http://10.0.4.11:8081/models/available # Check model status curl http://10.0.4.11:8081/models/status # Test the router curl http://10.0.4.100:9001/v1/models ``` ## Configuration ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `MANIFEST_PATH` | `/home/bigt/AI/llm/manifest.yaml` | Path to the YAML manifest file | | `SIDECAR_PORT` | `8080` | Port the sidecar listens on | ### Manifest Format ```yaml - id: model-id name: "Display Name" model_path: "/path/to/model.gguf" flags: # Arbitrary llama-server flags n_ctx: 8192 n_gpu_layers: 35 ``` - `id`: Unique identifier used in `model` field of chat completions - `name`: Human-readable display name - `model_path`: Absolute path to the GGUF file - `flags`: Any llama-server CLI flags (n_ctx, n_gpu_layers, etc.) ## Managing the Service ```bash # Start/Stop/Restart sudo systemctl start llm-sidecar sudo systemctl stop llm-sidecar sudo systemctl restart llm-sidecar # View logs sudo journalctl -u llm-sidecar -f # Check status sudo systemctl status llm-sidecar # Disable auto-start sudo systemctl disable llm-sidecar ``` ## Troubleshooting - **Sidecar not starting**: Check `sudo journalctl -u llm-sidecar -n 50` - **Manifest errors**: Check that YAML is valid (`python3 -c "import yaml; yaml.safe_load(open('manifest.yaml'))"`) - **llama-server crashes**: Sidecar auto-restarts it up to 3 times before the circuit breaker opens - **Port conflict**: Change `SIDECAR_PORT` in the service environment