| **Router** | The FastAPI proxy running in Docker (10.0.4.100:9001). Intercepts LLM requests, checks active model, and routes accordingly. |
| **Sidecar** | A lightweight Python service running on the Main PC via systemd. Manages the llama-server subprocess and serves manifest/profile data. |
| **Profile** | A named model configuration from the manifest. Contains a model path, display name, and arbitrary llama-server flags. A single GGUF can have multiple profiles. |
| **Manifest** | A YAML file on the Main PC (`/home/bigt/AI/llm/manifest.yaml`) that lists all available profiles. Source of truth for what models Hermes sees. |
| **Model Switch** | The destructive handoff process: stop current llama-server, start new one with chosen profile's flags, wait for readiness. |
| **Active Model** | The profile currently loaded in llama-server. Queried from the sidecar before each request. |
| **Fallback** | The LXC container (10.0.4.200) running a fixed model. Pure fallback — no switching, no sidecar. Always-on safety net. |
| **Queue** | In-memory request buffer held during a model switch. Hard cap: 120 seconds. Drains once sidecar reports ready. |