intelligence-router/sidecar
root 37fee5341e fix: capture llama-server stderr, fix YAML boolean flag conversion, reduce polling timeout
Three fixes for the model-not-loading bug:

1. **YAML boolean → CLI flag bug**: YAML parses 'on'/'off'/'yes'/'no' as Python
   bools. str(True)='True' which is INVALID for llama.cpp's --flash-attn flag
   (expects 'on'/'off'/'auto'). Added _flag_value() converter that maps bools
   to 'on'/'off' strings.

2. **llama-server stderr was DEVNULL**: All error messages (bad model path,
   OOM, invalid flag) were invisible. Now captured to /tmp/llama-server-stderr.log
   and dumped to the sidecar log on failure.

3. **Reduce polling timeout**: 240 retries × 0.5s = 120s hang. Reduced to
   60 retries × 0.5s = 30s. Still dumps stderr + exit code on failure.

4. **Manifest VRAM fix**: gemma4-26b-compact-long-128k used q8_0 KV cache at
   128K context (~24GB on 24GB RTX 3090 — borderline OOM). Changed to q4_0
   (~18GB, comfortable).
2026-06-16 00:06:45 +00:00
..
__init__.py Epic: Model Switching via Sidecar — Issues #2-#3 2026-06-15 00:49:24 +00:00
app.py fix: capture llama-server stderr, fix YAML boolean flag conversion, reduce polling timeout 2026-06-16 00:06:45 +00:00
manifest.py Epic: Model Switching via Sidecar — Issues #2-#3 2026-06-15 00:49:24 +00:00