intelligence-router

Author	SHA1	Message	Date
root	7e86a30bd8	fix: resolve port conflict between sidecar and llama-server Sidecar and llama-server were both configured on port 8080, causing llama-server to fail on startup (port already in use). - sidecar/app.py: LLAMA_SERVER_PORT → 8081 (sidecar stays on 8080) - docker-compose.yml: MAIN_PC_URL → port 8081 (router sends chat requests to llama-server, not the sidecar)	2026-06-15 15:31:31 +00:00
Tudorel Oprisan	af12370632	changed llama-server location	2026-06-15 16:10:49 +01:00
root	45417068ae	fix: change sidecar port from 8081 to 8080 The sidecar is deployed on port 8080 instead of 8081. Update all: - Default SIDECAR_PORT in sidecar/app.py - Default SIDECAR_URL in main.py (router) - deploy/llm-sidecar.service Environment - deploy/README.md (.env example + config table) - All 7 test files (conftest, circuit-breaker, fallback, queue, model-detection, sse-progress, v1-models)	2026-06-15 13:17:31 +00:00
root	c491779248	Epic: Model Switching via Sidecar — Issues #2-#3 Issue #2: Manifest schema + Sidecar foundation - sidecar/manifest.py: YAML manifest loading and profile validation - sidecar/app.py: FastAPI sidecar service with /models/available, /models/status endpoints - Router GET /v1/models: proxies to sidecar, returns OpenAI-compatible model list - Tests: 12 manifest tests, 6 sidecar endpoint tests, 3 router tests (21 total) Issue #3: Sidecar model switch + Router request queue - Sidecar POST /models/switch: stops current llama-server, starts new one, polls for readiness - Switch lock prevents concurrent switches (threading.Lock for TestClient compatibility) - Router request queue: max 10 requests, 120s hard timeout, 429 when full - Router automatic model detection: extracts model from chat body, matches against sidecar status - Full proxy endpoint with Sidecar → Main PC routing and fallback chain - Tests: 5 sidecar switch tests, 4 queue tests, 3 router integration tests (12 total) Total: 33 tests, all passing	2026-06-15 00:49:24 +00:00

4 Commits