intelligence-router/deploy/README.md

# LLM Sidecar — Deployment Guide

## Quick Install

On the Main PC:

```bash
# 1. Copy the service file
sudo cp deploy/llm-sidecar.service /etc/systemd/system/

# 2. Create the working directory and copy files
mkdir -p /home/bigt/AI/llm
cp deploy/manifest.yaml /home/bigt/AI/llm/manifest.yaml

# Copy the sidecar Python package (app.py + manifest.py)
cp -r sidecar/ /home/bigt/AI/llm/sidecar/

# Copy requirements.txt for the venv
cp requirements.txt /home/bigt/AI/llm/

# 3. Create a Python virtual environment with dependencies
python3 -m venv /home/bigt/AI/llm/venv
/home/bigt/AI/llm/venv/bin/pip install -r /home/bigt/AI/llm/requirements.txt

# 4. Create a .env for the sidecar (optional)
cat > /home/bigt/AI/llm/.env << 'EOF'
# Sidecar configuration
MANIFEST_PATH=/home/bigt/AI/llm/manifest.yaml
SIDECAR_PORT=8080
EOF

# 5. Enable and start the service
sudo systemctl daemon-reload
sudo systemctl enable --now llm-sidecar

# 6. Verify it's running
sudo systemctl status llm-sidecar
```

## Verify

```bash
# Check sidecar is responding
curl http://10.0.4.11:8081/models/available

# Check model status
curl http://10.0.4.11:8081/models/status

# Test the router
curl http://10.0.4.100:9001/v1/models
```

## Configuration

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `MANIFEST_PATH` | `/home/bigt/AI/llm/manifest.yaml` | Path to the YAML manifest file |
| `SIDECAR_PORT` | `8080` | Port the sidecar listens on |

### Manifest Format

```yaml
- id: model-id
  name: "Display Name"
  model_path: "/path/to/model.gguf"
  flags:          # Arbitrary llama-server flags
    n_ctx: 8192
    n_gpu_layers: 35
```

- `id`: Unique identifier used in `model` field of chat completions
- `name`: Human-readable display name
- `model_path`: Absolute path to the GGUF file
- `flags`: Any llama-server CLI flags (n_ctx, n_gpu_layers, etc.)

## Managing the Service

```bash
# Start/Stop/Restart
sudo systemctl start llm-sidecar
sudo systemctl stop llm-sidecar
sudo systemctl restart llm-sidecar

# View logs
sudo journalctl -u llm-sidecar -f

# Check status
sudo systemctl status llm-sidecar

# Disable auto-start
sudo systemctl disable llm-sidecar
```

## Troubleshooting

- **Sidecar not starting**: Check `sudo journalctl -u llm-sidecar -n 50`
- **Manifest errors**: Check that YAML is valid (`python3 -c "import yaml; yaml.safe_load(open('manifest.yaml'))"`)
- **llama-server crashes**: Sidecar auto-restarts it up to 3 times before the circuit breaker opens
- **Port conflict**: Change `SIDECAR_PORT` in the service environment
Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing 2026-06-15 04:13:36 +03:00			`# LLM Sidecar — Deployment Guide`

			`## Quick Install`

			`On the Main PC:`

			```bash
			`# 1. Copy the service file`
			`sudo cp deploy/llm-sidecar.service /etc/systemd/system/`

fix: use venv for sidecar deps, add missing deploy steps - llm-sidecar.service: use /home/bigt/AI/llm/venv/bin/uvicorn instead of global python3 -m uvicorn (avoids 'No module named uvicorn' error) - deploy/README.md: add steps to copy sidecar/ package, create venv, and pip install requirements.txt 2026-06-15 16:02:34 +03:00			`# 2. Create the working directory and copy files`
Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing 2026-06-15 04:13:36 +03:00			`mkdir -p /home/bigt/AI/llm`
			`cp deploy/manifest.yaml /home/bigt/AI/llm/manifest.yaml`

fix: use venv for sidecar deps, add missing deploy steps - llm-sidecar.service: use /home/bigt/AI/llm/venv/bin/uvicorn instead of global python3 -m uvicorn (avoids 'No module named uvicorn' error) - deploy/README.md: add steps to copy sidecar/ package, create venv, and pip install requirements.txt 2026-06-15 16:02:34 +03:00			`# Copy the sidecar Python package (app.py + manifest.py)`
			`cp -r sidecar/ /home/bigt/AI/llm/sidecar/`

			`# Copy requirements.txt for the venv`
			`cp requirements.txt /home/bigt/AI/llm/`

			`# 3. Create a Python virtual environment with dependencies`
			`python3 -m venv /home/bigt/AI/llm/venv`
			`/home/bigt/AI/llm/venv/bin/pip install -r /home/bigt/AI/llm/requirements.txt`

			`# 4. Create a .env for the sidecar (optional)`
Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing 2026-06-15 04:13:36 +03:00			`cat > /home/bigt/AI/llm/.env << 'EOF'`
			`# Sidecar configuration`
			`MANIFEST_PATH=/home/bigt/AI/llm/manifest.yaml`
fix: change sidecar port from 8081 to 8080 The sidecar is deployed on port 8080 instead of 8081. Update all: - Default SIDECAR_PORT in sidecar/app.py - Default SIDECAR_URL in main.py (router) - deploy/llm-sidecar.service Environment - deploy/README.md (.env example + config table) - All 7 test files (conftest, circuit-breaker, fallback, queue, model-detection, sse-progress, v1-models) 2026-06-15 16:16:47 +03:00			`SIDECAR_PORT=8080`
Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing 2026-06-15 04:13:36 +03:00			`EOF`

fix: use venv for sidecar deps, add missing deploy steps - llm-sidecar.service: use /home/bigt/AI/llm/venv/bin/uvicorn instead of global python3 -m uvicorn (avoids 'No module named uvicorn' error) - deploy/README.md: add steps to copy sidecar/ package, create venv, and pip install requirements.txt 2026-06-15 16:02:34 +03:00			`# 5. Enable and start the service`
Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing 2026-06-15 04:13:36 +03:00			`sudo systemctl daemon-reload`
			`sudo systemctl enable --now llm-sidecar`

fix: use venv for sidecar deps, add missing deploy steps - llm-sidecar.service: use /home/bigt/AI/llm/venv/bin/uvicorn instead of global python3 -m uvicorn (avoids 'No module named uvicorn' error) - deploy/README.md: add steps to copy sidecar/ package, create venv, and pip install requirements.txt 2026-06-15 16:02:34 +03:00			`# 6. Verify it's running`
Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing 2026-06-15 04:13:36 +03:00			`sudo systemctl status llm-sidecar`
			```

			`## Verify`

			```bash
			`# Check sidecar is responding`
			`curl http://10.0.4.11:8081/models/available`

			`# Check model status`
			`curl http://10.0.4.11:8081/models/status`

			`# Test the router`
			`curl http://10.0.4.100:9001/v1/models`
			```

			`## Configuration`

			`### Environment Variables`

			`\| Variable \| Default \| Description \|`
			`\|----------\|---------\|-------------\|`
			\| `MANIFEST_PATH` \| `/home/bigt/AI/llm/manifest.yaml` \| Path to the YAML manifest file \|
fix: change sidecar port from 8081 to 8080 The sidecar is deployed on port 8080 instead of 8081. Update all: - Default SIDECAR_PORT in sidecar/app.py - Default SIDECAR_URL in main.py (router) - deploy/llm-sidecar.service Environment - deploy/README.md (.env example + config table) - All 7 test files (conftest, circuit-breaker, fallback, queue, model-detection, sse-progress, v1-models) 2026-06-15 16:16:47 +03:00			\| `SIDECAR_PORT` \| `8080` \| Port the sidecar listens on \|
Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing 2026-06-15 04:13:36 +03:00
			`### Manifest Format`

			```yaml
			`- id: model-id`
			`name: "Display Name"`
			`model_path: "/path/to/model.gguf"`
			`flags: # Arbitrary llama-server flags`
			`n_ctx: 8192`
			`n_gpu_layers: 35`
			```

			- `id`: Unique identifier used in `model` field of chat completions
			- `name`: Human-readable display name
			- `model_path`: Absolute path to the GGUF file
			- `flags`: Any llama-server CLI flags (n_ctx, n_gpu_layers, etc.)

			`## Managing the Service`

			```bash
			`# Start/Stop/Restart`
			`sudo systemctl start llm-sidecar`
			`sudo systemctl stop llm-sidecar`
			`sudo systemctl restart llm-sidecar`

			`# View logs`
			`sudo journalctl -u llm-sidecar -f`

			`# Check status`
			`sudo systemctl status llm-sidecar`

			`# Disable auto-start`
			`sudo systemctl disable llm-sidecar`
			```

			`## Troubleshooting`

			- Sidecar not starting: Check `sudo journalctl -u llm-sidecar -n 50`
			- Manifest errors: Check that YAML is valid (`python3 -c "import yaml; yaml.safe_load(open('manifest.yaml'))"`)
			`- llama-server crashes: Sidecar auto-restarts it up to 3 times before the circuit breaker opens`
			- Port conflict: Change `SIDECAR_PORT` in the service environment