Go to file
root 45dd793b69 fix: sidecar process kill was not awaiting wait() — old server held GPU VRAM
- _kill_llama_server() was sync calling an unawaited coroutine. process.wait() created
  a discarded coroutine object — the old llama-server was never waited on to release
  GPU memory before starting a new one, causing OOM on rapid model switches.
  Fixed with async await + 10s SIGTERM timeout + SIGKILL fallback.

- Changed _switch_lock from threading.Lock to asyncio.Lock() to prevent event loop
  deadlock during long switch operations.

- Router proxy: only trigger model switches for POST /v1/chat/completions and
  /v1/completions. Non-chat endpoints (GET probes, /api/show) no longer trigger
  unwanted model reloads.

- _ollama_show_lookup: return active profile context size when model_name is empty.
  Previously returned 404, causing Hermes Desktop to default to 256k context.

- Always drain_queue() + complete_switch() after switch failure so queued requests
  don't hang forever waiting on a never-set switching event.
2026-06-17 23:49:57 +00:00
.hermes/plans fix: add probe endpoints and no-model fallback for Hermes Desktop compatibility 2026-06-15 15:22:15 +00:00
deploy fix: convert underscores to hyphens in llama-server flag names, fix n_ctx→ctx-size rename 2026-06-16 20:54:32 +00:00
docs Added next changes 2026-06-15 00:09:31 +00:00
scripts feat: add sync_models.py script to auto-update Hermes custom_providers from router model list 2026-06-15 21:10:36 +00:00
sidecar fix: sidecar process kill was not awaiting wait() — old server held GPU VRAM 2026-06-17 23:49:57 +00:00
tests fix: change sidecar port from 8081 to 8080 2026-06-15 13:17:31 +00:00
.env .env 2026-06-09 13:57:22 +03:00
.gitignore Epic: Model Switching via Sidecar — Issues #2-#3 2026-06-15 00:49:24 +00:00
CONTEXT.md Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment 2026-06-15 01:13:36 +00:00
docker-compose.yml fix: resolve port conflict between sidecar and llama-server 2026-06-15 15:31:31 +00:00
Dockerfile Initial commit: migrate intelligence-router files 2026-06-09 11:48:43 +01:00
main.py fix: sidecar process kill was not awaiting wait() — old server held GPU VRAM 2026-06-17 23:49:57 +00:00
pytest.ini feat: add 15 model profiles to manifest.yaml 2026-06-15 12:34:46 +00:00
requirements.txt Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment 2026-06-15 01:13:36 +00:00