The FIRST request that triggers a model switch was blocking the HTTP response for 10-30s while waiting for the sidecar to load the model. Hermes Desktop's client timed out during this wait, causing 'nothing happens' on new session. Fix: refactored the proxy handler so ALL requests during a model switch use the same SSE streaming pattern (immediate 200, progress events, then actual response piped through after switch completes). The switch now runs as a background asyncio task via create_task(). - Added _background_switch() — runs POST /models/switch in background task with complete_switch() + drain_queue() in finally block - All switch-triggering requests go through queue_request() + StreamingResponse - SSE generator now falls through to OpenRouter/LXC if Main PC unreachable (switch failure case) instead of hanging indefinitely Sidecar fixes from previous commit: - _kill_llama_server() is now async with proper await on process termination - _switch_lock changed from threading.Lock to asyncio.Lock() |
||
|---|---|---|
| .hermes/plans | ||
| deploy | ||
| docs | ||
| scripts | ||
| sidecar | ||
| tests | ||
| .env | ||
| .gitignore | ||
| CONTEXT.md | ||
| docker-compose.yml | ||
| Dockerfile | ||
| main.py | ||
| pytest.ini | ||
| requirements.txt | ||