intelligence-router/deploy/manifest.yaml
root 4914363089 Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment
Issue #4: Automatic model detection and switch
- Router extracts model from chat body, queries sidecar, triggers switch on mismatch
- Matching active model routes directly to Main PC
- No active model triggers cold start switch
- Tests: 4 test_router_model_detection.py

Issue #5: SSE switch progress feedback
- _sse_format() correctly serializes SSE events
- sse_progress_stream() generates phase progression events
- Proxy yields SSE events then actual response
- Tests: 3 test_router_sse_progress.py

Issue #6: Circuit breaker + OpenRouter fallback
- Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3)
- OpenRouter API key from env, no longer uses x-intelligence-level header
- Fixes: OPENROUTER_BASE, SSE format, circuit state isolation
- Tests: 7 test_router_circuit_breaker.py

Issue #7: LXC fallback chain completion
- Full fallback: Main PC → OpenRouter → LXC
- Each backend health-checked via /v1/models before routing
- All backends down → 503 response
- Fixed: execute() wrapped in try/except to trigger fallback chain
- Tests: 3 test_router_fallback_lxc.py

Issue #8: Systemd service deployment
- deploy/llm-sidecar.service: systemd unit with Restart=always
- deploy/manifest.yaml: example manifest with 3 profiles
- deploy/README.md: deployment instructions
- Updated: docker-compose.yml, requirements.txt, Dockerfile

Test framework improvements:
- tests/conftest.py: shared URL patches for all router tests
- Fixed global state pollution in circuit breaker tests
- Fixed test sidecar switch test (AsyncMock for async function)

Total: 42 tests passing
2026-06-15 01:13:36 +00:00

30 lines
800 B
YAML

# LLM Model Manifest
# Each profile defines a named model configuration for llama-server.
# The sidecar reads this file on every request — no restart needed.
#
# Usage:
# 1. Edit this file with available GGUFs and desired parameters
# 2. The sidecar automatically picks up changes
# 3. Use the Hermes model picker to switch models
- id: qwen-3-8b
name: "Qwen 3 8B"
model_path: "/home/bigt/AI/llm/qwen/qwen3-8b-q4.gguf"
flags:
n_ctx: 8192
n_gpu_layers: 35
- id: qwen-3-8b-long
name: "Qwen 3 8B (Long Context)"
model_path: "/home/bigt/AI/llm/qwen/qwen3-8b-q4.gguf"
flags:
n_ctx: 32768
n_gpu_layers: 20
- id: llama-4-maverick
name: "Llama 4 Maverick"
model_path: "/home/bigt/AI/llm/llama4/llama4-maverick-q4.gguf"
flags:
n_ctx: 8192
n_gpu_layers: 35