intelligence-router/deploy/manifest.yaml

# LLM Model Manifest
# Each profile defines a named model configuration for llama-server.
# The sidecar reads this file on every request — no restart needed.
#
# Usage:
#   1. Edit this file with available GGUFs and desired parameters
#   2. The sidecar automatically picks up changes
#   3. Use the Hermes model picker to switch models

- id: qwen-3-8b
  name: "Qwen 3 8B"
  model_path: "/home/bigt/AI/llm/qwen/qwen3-8b-q4.gguf"
  flags:
    n_ctx: 8192
    n_gpu_layers: 35

- id: qwen-3-8b-long
  name: "Qwen 3 8B (Long Context)"
  model_path: "/home/bigt/AI/llm/qwen/qwen3-8b-q4.gguf"
  flags:
    n_ctx: 32768
    n_gpu_layers: 20

- id: llama-4-maverick
  name: "Llama 4 Maverick"
  model_path: "/home/bigt/AI/llm/llama4/llama4-maverick-q4.gguf"
  flags:
    n_ctx: 8192
    n_gpu_layers: 35
Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing 2026-06-15 04:13:36 +03:00			`# LLM Model Manifest`
			`# Each profile defines a named model configuration for llama-server.`
			`# The sidecar reads this file on every request — no restart needed.`
			`#`
			`# Usage:`
			`# 1. Edit this file with available GGUFs and desired parameters`
			`# 2. The sidecar automatically picks up changes`
			`# 3. Use the Hermes model picker to switch models`

			`- id: qwen-3-8b`
			`name: "Qwen 3 8B"`
			`model_path: "/home/bigt/AI/llm/qwen/qwen3-8b-q4.gguf"`
			`flags:`
			`n_ctx: 8192`
			`n_gpu_layers: 35`

			`- id: qwen-3-8b-long`
			`name: "Qwen 3 8B (Long Context)"`
			`model_path: "/home/bigt/AI/llm/qwen/qwen3-8b-q4.gguf"`
			`flags:`
			`n_ctx: 32768`
			`n_gpu_layers: 20`

			`- id: llama-4-maverick`
			`name: "Llama 4 Maverick"`
			`model_path: "/home/bigt/AI/llm/llama4/llama4-maverick-q4.gguf"`
			`flags:`
			`n_ctx: 8192`
			`n_gpu_layers: 35`