Three fixes for the model-not-loading bug: 1. **YAML boolean → CLI flag bug**: YAML parses 'on'/'off'/'yes'/'no' as Python bools. str(True)='True' which is INVALID for llama.cpp's --flash-attn flag (expects 'on'/'off'/'auto'). Added _flag_value() converter that maps bools to 'on'/'off' strings. 2. **llama-server stderr was DEVNULL**: All error messages (bad model path, OOM, invalid flag) were invisible. Now captured to /tmp/llama-server-stderr.log and dumped to the sidecar log on failure. 3. **Reduce polling timeout**: 240 retries × 0.5s = 120s hang. Reduced to 60 retries × 0.5s = 30s. Still dumps stderr + exit code on failure. 4. **Manifest VRAM fix**: gemma4-26b-compact-long-128k used q8_0 KV cache at 128K context (~24GB on 24GB RTX 3090 — borderline OOM). Changed to q4_0 (~18GB, comfortable). |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| app.py | ||
| manifest.py | ||