# Intelligence Router — Context & Glossary ## Terminology | Term | Definition | |------|------------| | **Router** | The FastAPI proxy running in Docker (10.0.4.100:9001). Intercepts LLM requests, checks active model, and routes accordingly. | | **Sidecar** | A lightweight Python service running on the Main PC via systemd. Manages the llama-server subprocess and serves manifest/profile data. | | **Profile** | A named model configuration from the manifest. Contains a model path, display name, and arbitrary llama-server flags. A single GGUF can have multiple profiles. | | **Manifest** | A YAML file on the Main PC (`/home/bigt/AI/llm/manifest.yaml`) that lists all available profiles. Source of truth for what models Hermes sees. | | **Model Switch** | The destructive handoff process: stop current llama-server, start new one with chosen profile's flags, wait for readiness. | | **Active Model** | The profile currently loaded in llama-server. Queried from the sidecar before each request. | | **Fallback** | The LXC container (10.0.4.200) running a fixed model. Pure fallback — no switching, no sidecar. Always-on safety net. | | **Queue** | In-memory request buffer held during a model switch. Hard cap: 120 seconds. Drains once sidecar reports ready. | ## Architecture ``` Hermes (Desktop App) ↕ (OpenAI-compatible API) Intelligence Router (Docker, 10.0.4.100:9001) ├─→ Sidecar (Main PC, 10.0.4.11:8081) — model switching, manifest, status ├─→ OpenRouter (DeepSeek V4 Flash) — after 3 failed sidecar recoveries └─→ Fallback SLM (LXC, 10.0.4.200) — out-of-credits safety net ``` ## Decisions - **Manifest over scan** — profiles explicitly listed, not discovered by filesystem walk. Allows multiple configurations per GGUF. - **Flexible flags** — each profile carries an arbitrary `flags` dict. No predetermined set of parameters. - **Stateless routing** — router always asks the sidecar for the active model before each request. No local caching of state. - **Cold start** — sidecar starts with no model loaded. User picks from Hermes picker. - **Queue on switch** — first request triggers switch, subsequent requests queue. Hard cap: 120s. - **SSE feedback** — router injects `event: model_switching` SSE event so Hermes shows progress instead of a blank spinner. - **LXC as pure fallback** — no switching, no sidecar. Out-of-credits safety net. - **Sidecar as systemd service** — auto-restart on crash, starts at boot, no default model. - **Circuit breaker** — sidecar auto-restarts llama-server up to 3 times on crash, then router falls back to OpenRouter. - **Queue cap** — max 10 queued requests, 120s hard timeout. `429` beyond capacity. - **Readiness detection** — sidecar polls `localhost:8080/v1/models` every 500ms. Unblocks queue on `200`. - **Switch lock** — in-memory lock prevents concurrent switches. Subsequent requests join queue. - **Custom provider in Hermes** — router registered as `custom` with `base_url: http://10.0.4.100:9001/v1`. No auth. - **OpenRouter stripped from direct routing** — old `x-intelligence-level: High` removed. OpenRouter is a fallback backend, not a direct routing rule. - **OpenRouter key** — stored in router `.env` as `OPENROUTER_API_KEY`. - **Fallback chain**: Main PC → OpenRouter → LXC. Each level tried only if the previous fails. ## Implementation Files | File | Purpose | |------|---------| | `main.py` | Router — FastAPI proxy with routing, queue, circuit breaker, fallback chain | | `sidecar/app.py` | Sidecar — FastAPI service for model management | | `sidecar/manifest.py` | Sidecar manifest YAML loading and validation | | `deploy/llm-sidecar.service` | Systemd service unit file for the sidecar | | `deploy/manifest.yaml` | Example manifest file | | `deploy/README.md` | Deployment instructions | ## API Endpoints ### Sidecar (`10.0.4.11:8081`) - `GET /models/available` — List all manifest profiles - `GET /models/status` — Current active model status - `POST /models/switch` — Switch to a different model profile ### Router (`10.0.4.100:9001`) - `GET /v1/models` — OpenAI-compatible model list (proxies from sidecar) - `GET /models/status` — Proxy to sidecar status - `POST /models/switch` — Proxy to sidecar switch - `GET /health` — Router health check - `/{path:path}` — Smart proxy with automatic switching and fallback