intelligence-router/deploy/llm-sidecar.service

[Unit]
Description=LLM Sidecar Service — manages llama-server subprocess
After=network.target

[Service]
Type=simple
User=bigt
WorkingDirectory=/home/bigt/AI/llm

# Environment
EnvironmentFile=-/home/bigt/AI/llm/.env
Environment=MANIFEST_PATH=/home/bigt/AI/llm/manifest.yaml
Environment=SIDECAR_PORT=8080
Environment=PATH=/home/bigt/AI/llm/venv/bin:/usr/local/bin:/usr/bin:/bin
Environment=PYTHONUNBUFFERED=1

# Use the sidecar's venv — install deps via deploy/README.md
ExecStart=/home/bigt/AI/llm/venv/bin/uvicorn sidecar.app:app --host 0.0.0.0 --port 8080
Restart=always
RestartSec=3

# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=llm-sidecar

# Security hardening (optional, adjust as needed)
NoNewPrivileges=true
ProtectSystem=strict
ReadWritePaths=/home/bigt/AI/llm

[Install]
WantedBy=multi-user.target
Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing 2026-06-15 04:13:36 +03:00			`[Unit]`
			`Description=LLM Sidecar Service — manages llama-server subprocess`
			`After=network.target`

			`[Service]`
			`Type=simple`
			`User=bigt`
			`WorkingDirectory=/home/bigt/AI/llm`

			`# Environment`
			`EnvironmentFile=-/home/bigt/AI/llm/.env`
			`Environment=MANIFEST_PATH=/home/bigt/AI/llm/manifest.yaml`
fix: change sidecar port from 8081 to 8080 The sidecar is deployed on port 8080 instead of 8081. Update all: - Default SIDECAR_PORT in sidecar/app.py - Default SIDECAR_URL in main.py (router) - deploy/llm-sidecar.service Environment - deploy/README.md (.env example + config table) - All 7 test files (conftest, circuit-breaker, fallback, queue, model-detection, sse-progress, v1-models) 2026-06-15 16:16:47 +03:00			`Environment=SIDECAR_PORT=8080`
fix: use venv for sidecar deps, add missing deploy steps - llm-sidecar.service: use /home/bigt/AI/llm/venv/bin/uvicorn instead of global python3 -m uvicorn (avoids 'No module named uvicorn' error) - deploy/README.md: add steps to copy sidecar/ package, create venv, and pip install requirements.txt 2026-06-15 16:02:34 +03:00			`Environment=PATH=/home/bigt/AI/llm/venv/bin:/usr/local/bin:/usr/bin:/bin`
fix: unbuffer sidecar stdout so logs appear in journalctl 2026-06-15 19:25:58 +03:00			`Environment=PYTHONUNBUFFERED=1`
Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing 2026-06-15 04:13:36 +03:00
fix: use venv for sidecar deps, add missing deploy steps - llm-sidecar.service: use /home/bigt/AI/llm/venv/bin/uvicorn instead of global python3 -m uvicorn (avoids 'No module named uvicorn' error) - deploy/README.md: add steps to copy sidecar/ package, create venv, and pip install requirements.txt 2026-06-15 16:02:34 +03:00			`# Use the sidecar's venv — install deps via deploy/README.md`
fixed port and conflict 2026-06-15 16:07:18 +03:00			`ExecStart=/home/bigt/AI/llm/venv/bin/uvicorn sidecar.app:app --host 0.0.0.0 --port 8080`
Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing 2026-06-15 04:13:36 +03:00			`Restart=always`
			`RestartSec=3`

			`# Logging`
			`StandardOutput=journal`
			`StandardError=journal`
			`SyslogIdentifier=llm-sidecar`

			`# Security hardening (optional, adjust as needed)`
			`NoNewPrivileges=true`
			`ProtectSystem=strict`
			`ReadWritePaths=/home/bigt/AI/llm`

			`[Install]`
			`WantedBy=multi-user.target`