intelligence-router/deploy/README.md

# LLM Sidecar — Deployment Guide

## Quick Install

On the Main PC:

```bash
# 1. Copy the service file
sudo cp deploy/llm-sidecar.service /etc/systemd/system/

# 2. Copy the manifest (adjust paths as needed)
mkdir -p /home/bigt/AI/llm
cp deploy/manifest.yaml /home/bigt/AI/llm/manifest.yaml

# 3. Create a .env for the sidecar (optional)
cat > /home/bigt/AI/llm/.env << 'EOF'
# Sidecar configuration
MANIFEST_PATH=/home/bigt/AI/llm/manifest.yaml
SIDECAR_PORT=8081
EOF

# 4. Enable and start the service
sudo systemctl daemon-reload
sudo systemctl enable --now llm-sidecar

# 5. Verify it's running
sudo systemctl status llm-sidecar
```

## Verify

```bash
# Check sidecar is responding
curl http://10.0.4.11:8081/models/available

# Check model status
curl http://10.0.4.11:8081/models/status

# Test the router
curl http://10.0.4.100:9001/v1/models
```

## Configuration

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `MANIFEST_PATH` | `/home/bigt/AI/llm/manifest.yaml` | Path to the YAML manifest file |
| `SIDECAR_PORT` | `8081` | Port the sidecar listens on |

### Manifest Format

```yaml
- id: model-id
  name: "Display Name"
  model_path: "/path/to/model.gguf"
  flags:          # Arbitrary llama-server flags
    n_ctx: 8192
    n_gpu_layers: 35
```

- `id`: Unique identifier used in `model` field of chat completions
- `name`: Human-readable display name
- `model_path`: Absolute path to the GGUF file
- `flags`: Any llama-server CLI flags (n_ctx, n_gpu_layers, etc.)

## Managing the Service

```bash
# Start/Stop/Restart
sudo systemctl start llm-sidecar
sudo systemctl stop llm-sidecar
sudo systemctl restart llm-sidecar

# View logs
sudo journalctl -u llm-sidecar -f

# Check status
sudo systemctl status llm-sidecar

# Disable auto-start
sudo systemctl disable llm-sidecar
```

## Troubleshooting

- **Sidecar not starting**: Check `sudo journalctl -u llm-sidecar -n 50`
- **Manifest errors**: Check that YAML is valid (`python3 -c "import yaml; yaml.safe_load(open('manifest.yaml'))"`)
- **llama-server crashes**: Sidecar auto-restarts it up to 3 times before the circuit breaker opens
- **Port conflict**: Change `SIDECAR_PORT` in the service environment