Sidecar and llama-server were both configured on port 8080, causing
llama-server to fail on startup (port already in use).
- sidecar/app.py: LLAMA_SERVER_PORT → 8081 (sidecar stays on 8080)
- docker-compose.yml: MAIN_PC_URL → port 8081 (router sends chat
requests to llama-server, not the sidecar)
The sidecar is deployed on port 8080 instead of 8081. Update all:
- Default SIDECAR_PORT in sidecar/app.py
- Default SIDECAR_URL in main.py (router)
- deploy/llm-sidecar.service Environment
- deploy/README.md (.env example + config table)
- All 7 test files (conftest, circuit-breaker, fallback, queue,
model-detection, sse-progress, v1-models)
Issue #2: Manifest schema + Sidecar foundation
- sidecar/manifest.py: YAML manifest loading and profile validation
- sidecar/app.py: FastAPI sidecar service with /models/available, /models/status endpoints
- Router GET /v1/models: proxies to sidecar, returns OpenAI-compatible model list
- Tests: 12 manifest tests, 6 sidecar endpoint tests, 3 router tests (21 total)
Issue #3: Sidecar model switch + Router request queue
- Sidecar POST /models/switch: stops current llama-server, starts new one, polls for readiness
- Switch lock prevents concurrent switches (threading.Lock for TestClient compatibility)
- Router request queue: max 10 requests, 120s hard timeout, 429 when full
- Router automatic model detection: extracts model from chat body, matches against sidecar status
- Full proxy endpoint with Sidecar → Main PC routing and fallback chain
- Tests: 5 sidecar switch tests, 4 queue tests, 3 router integration tests (12 total)
Total: 33 tests, all passing