intelligence-router

Author	SHA1	Message	Date
root	4ee85972ec	fix: convert underscores to hyphens in llama-server flag names, fix n_ctx→ctx-size rename Two changes to fix 'error: invalid argument: --n-ctx' during model switch: 1. sidecar/app.py: Added _flag_key() converter that normalises underscores to hyphens in flag names and handles the n_ctx→ctx-size rename. The code now converts e.g. n_gpu_layers → n-gpu-layers, top_p → top-p, top_k → top-k, min_p → min-p before passing to llama-server CLI. 2. deploy/manifest.yaml: Updated all 20 profiles to use correct llama-server flag names: n_ctx→ctx-size, n_gpu_layers→n-gpu-layers, top_p→top-p, top_k→top-k, min_p→min-p. All flags now use hyphens, matching what llama-server actually accepts.	2026-06-16 20:54:32 +00:00
root	1551c281c2	fix: move llama-server stderr log from /tmp to working dir (ReadWritePaths compat) The sidecar systemd service has ProtectSystem=strict and ReadWritePaths=/home/bigt/AI/llm, making /tmp read-only. Writing /tmp/llama-server-stderr.log failed with EROFS. Changed LLAMA_STDERR_LOG to os.path.join(dirname(MANIFEST_PATH), ...), resolving to /home/bigt/AI/llm/llama-server-stderr.log, which is within the allowed ReadWritePaths.	2026-06-16 20:36:10 +00:00
root	37fee5341e	fix: capture llama-server stderr, fix YAML boolean flag conversion, reduce polling timeout Three fixes for the model-not-loading bug: 1. YAML boolean → CLI flag bug: YAML parses 'on'/'off'/'yes'/'no' as Python bools. str(True)='True' which is INVALID for llama.cpp's --flash-attn flag (expects 'on'/'off'/'auto'). Added _flag_value() converter that maps bools to 'on'/'off' strings. 2. llama-server stderr was DEVNULL: All error messages (bad model path, OOM, invalid flag) were invisible. Now captured to /tmp/llama-server-stderr.log and dumped to the sidecar log on failure. 3. Reduce polling timeout: 240 retries × 0.5s = 120s hang. Reduced to 60 retries × 0.5s = 30s. Still dumps stderr + exit code on failure. 4. Manifest VRAM fix: gemma4-26b-compact-long-128k used q8_0 KV cache at 128K context (~24GB on 24GB RTX 3090 — borderline OOM). Changed to q4_0 (~18GB, comfortable).	2026-06-16 00:06:45 +00:00
root	36abbf573e	fix: unbuffer sidecar stdout so logs appear in journalctl	2026-06-15 16:25:58 +00:00
Tudorel Oprisan	1e9305395e	Fixed llama-server path	2026-06-15 17:01:53 +01:00
root	7e86a30bd8	fix: resolve port conflict between sidecar and llama-server Sidecar and llama-server were both configured on port 8080, causing llama-server to fail on startup (port already in use). - sidecar/app.py: LLAMA_SERVER_PORT → 8081 (sidecar stays on 8080) - docker-compose.yml: MAIN_PC_URL → port 8081 (router sends chat requests to llama-server, not the sidecar)	2026-06-15 15:31:31 +00:00
Tudorel Oprisan	af12370632	changed llama-server location	2026-06-15 16:10:49 +01:00
root	45417068ae	fix: change sidecar port from 8081 to 8080 The sidecar is deployed on port 8080 instead of 8081. Update all: - Default SIDECAR_PORT in sidecar/app.py - Default SIDECAR_URL in main.py (router) - deploy/llm-sidecar.service Environment - deploy/README.md (.env example + config table) - All 7 test files (conftest, circuit-breaker, fallback, queue, model-detection, sse-progress, v1-models)	2026-06-15 13:17:31 +00:00
root	c491779248	Epic: Model Switching via Sidecar — Issues #2-#3 Issue #2: Manifest schema + Sidecar foundation - sidecar/manifest.py: YAML manifest loading and profile validation - sidecar/app.py: FastAPI sidecar service with /models/available, /models/status endpoints - Router GET /v1/models: proxies to sidecar, returns OpenAI-compatible model list - Tests: 12 manifest tests, 6 sidecar endpoint tests, 3 router tests (21 total) Issue #3: Sidecar model switch + Router request queue - Sidecar POST /models/switch: stops current llama-server, starts new one, polls for readiness - Switch lock prevents concurrent switches (threading.Lock for TestClient compatibility) - Router request queue: max 10 requests, 120s hard timeout, 429 when full - Router automatic model detection: extracts model from chat body, matches against sidecar status - Full proxy endpoint with Sidecar → Main PC routing and fallback chain - Tests: 5 sidecar switch tests, 4 queue tests, 3 router integration tests (12 total) Total: 33 tests, all passing	2026-06-15 00:49:24 +00:00

9 Commits