intelligence-router

History

root 37fee5341e fix: capture llama-server stderr, fix YAML boolean flag conversion, reduce polling timeout Three fixes for the model-not-loading bug: 1. YAML boolean → CLI flag bug: YAML parses 'on'/'off'/'yes'/'no' as Python bools. str(True)='True' which is INVALID for llama.cpp's --flash-attn flag (expects 'on'/'off'/'auto'). Added _flag_value() converter that maps bools to 'on'/'off' strings. 2. llama-server stderr was DEVNULL: All error messages (bad model path, OOM, invalid flag) were invisible. Now captured to /tmp/llama-server-stderr.log and dumped to the sidecar log on failure. 3. Reduce polling timeout: 240 retries × 0.5s = 120s hang. Reduced to 60 retries × 0.5s = 30s. Still dumps stderr + exit code on failure. 4. Manifest VRAM fix: gemma4-26b-compact-long-128k used q8_0 KV cache at 128K context (~24GB on 24GB RTX 3090 — borderline OOM). Changed to q4_0 (~18GB, comfortable).		2026-06-16 00:06:45 +00:00
..
__init__.py	Epic: Model Switching via Sidecar — Issues #2-#3	2026-06-15 00:49:24 +00:00
app.py	fix: capture llama-server stderr, fix YAML boolean flag conversion, reduce polling timeout	2026-06-16 00:06:45 +00:00
manifest.py	Epic: Model Switching via Sidecar — Issues #2-#3	2026-06-15 00:49:24 +00:00