LLM Sidecar — Deployment Guide
Quick Install
On the Main PC:
# 1. Copy the service file
sudo cp deploy/llm-sidecar.service /etc/systemd/system/
# 2. Create the working directory and copy files
mkdir -p /home/bigt/AI/llm
cp deploy/manifest.yaml /home/bigt/AI/llm/manifest.yaml
# Copy the sidecar Python package (app.py + manifest.py)
cp -r sidecar/ /home/bigt/AI/llm/sidecar/
# Copy requirements.txt for the venv
cp requirements.txt /home/bigt/AI/llm/
# 3. Create a Python virtual environment with dependencies
python3 -m venv /home/bigt/AI/llm/venv
/home/bigt/AI/llm/venv/bin/pip install -r /home/bigt/AI/llm/requirements.txt
# 4. Create a .env for the sidecar (optional)
cat > /home/bigt/AI/llm/.env << 'EOF'
# Sidecar configuration
MANIFEST_PATH=/home/bigt/AI/llm/manifest.yaml
SIDECAR_PORT=8080
EOF
# 5. Enable and start the service
sudo systemctl daemon-reload
sudo systemctl enable --now llm-sidecar
# 6. Verify it's running
sudo systemctl status llm-sidecar
Verify
# Check sidecar is responding
curl http://10.0.4.11:8081/models/available
# Check model status
curl http://10.0.4.11:8081/models/status
# Test the router
curl http://10.0.4.100:9001/v1/models
Configuration
Environment Variables
| Variable |
Default |
Description |
MANIFEST_PATH |
/home/bigt/AI/llm/manifest.yaml |
Path to the YAML manifest file |
SIDECAR_PORT |
8080 |
Port the sidecar listens on |
Manifest Format
- id: model-id
name: "Display Name"
model_path: "/path/to/model.gguf"
flags: # Arbitrary llama-server flags
n_ctx: 8192
n_gpu_layers: 35
id: Unique identifier used in model field of chat completions
name: Human-readable display name
model_path: Absolute path to the GGUF file
flags: Any llama-server CLI flags (n_ctx, n_gpu_layers, etc.)
Managing the Service
# Start/Stop/Restart
sudo systemctl start llm-sidecar
sudo systemctl stop llm-sidecar
sudo systemctl restart llm-sidecar
# View logs
sudo journalctl -u llm-sidecar -f
# Check status
sudo systemctl status llm-sidecar
# Disable auto-start
sudo systemctl disable llm-sidecar
Troubleshooting
- Sidecar not starting: Check
sudo journalctl -u llm-sidecar -n 50
- Manifest errors: Check that YAML is valid (
python3 -c "import yaml; yaml.safe_load(open('manifest.yaml'))")
- llama-server crashes: Sidecar auto-restarts it up to 3 times before the circuit breaker opens
- Port conflict: Change
SIDECAR_PORT in the service environment