intelligence-router

Author	SHA1	Message	Date
root	b3ac21b2c0	fix: first request no longer blocks on model switch — uses background task + SSE The FIRST request that triggers a model switch was blocking the HTTP response for 10-30s while waiting for the sidecar to load the model. Hermes Desktop's client timed out during this wait, causing 'nothing happens' on new session. Fix: refactored the proxy handler so ALL requests during a model switch use the same SSE streaming pattern (immediate 200, progress events, then actual response piped through after switch completes). The switch now runs as a background asyncio task via create_task(). - Added _background_switch() — runs POST /models/switch in background task with complete_switch() + drain_queue() in finally block - All switch-triggering requests go through queue_request() + StreamingResponse - SSE generator now falls through to OpenRouter/LXC if Main PC unreachable (switch failure case) instead of hanging indefinitely Sidecar fixes from previous commit: - _kill_llama_server() is now async with proper await on process termination - _switch_lock changed from threading.Lock to asyncio.Lock()	2026-06-18 00:10:48 +00:00
root	45dd793b69	fix: sidecar process kill was not awaiting wait() — old server held GPU VRAM - _kill_llama_server() was sync calling an unawaited coroutine. process.wait() created a discarded coroutine object — the old llama-server was never waited on to release GPU memory before starting a new one, causing OOM on rapid model switches. Fixed with async await + 10s SIGTERM timeout + SIGKILL fallback. - Changed _switch_lock from threading.Lock to asyncio.Lock() to prevent event loop deadlock during long switch operations. - Router proxy: only trigger model switches for POST /v1/chat/completions and /v1/completions. Non-chat endpoints (GET probes, /api/show) no longer trigger unwanted model reloads. - _ollama_show_lookup: return active profile context size when model_name is empty. Previously returned 404, causing Hermes Desktop to default to 256k context. - Always drain_queue() + complete_switch() after switch failure so queued requests don't hang forever waiting on a never-set switching event.	2026-06-17 23:49:57 +00:00
root	7e9b3f43e1	fix: circuit breaker deadlock — always query sidecar for status The circuit breaker opened after MAX_RECOVERY_ATTEMPTS failures but was never reset because the sidecar status query (which calls circuit_reset()) was skipped when the circuit was open. This caused a permanent deadlock: all subsequent requests went to the LXC fallback with no recovery possible. Fix: always query the sidecar for /models/status, even when the circuit is open. If the sidecar responds successfully, reset the circuit. The circuit breaker now only prevents the SWITCH operation, not the status health check. If a model is already running when the circuit is open, route to it directly.	2026-06-16 22:09:16 +00:00
root	75248741e7	fix: log exceptions on primary proxy target When the primary request to llama-server (10.0.4.11:8081) raises an exception (connection refused, timeout), it was silently swallowed by the catch-all except block, making it look like a sidecar/switch failure when it was actually a network-level error. Now prints: 'PROXY EXCEPTION on primary <url>: <ExceptionType>: <msg>'	2026-06-16 21:32:36 +00:00
root	5c1753dfef	fix: log sidecar switch failures + fix scoping bug in proxy handler Two changes to debug the fallback-to-LXC issue: 1. Added debug logging on switch failure: prints the profile name, sidecar response status, and error message. Also calls circuit_record_failure() so subsequent requests don't wait the full 120-second timeout before falling back. 2. Fixed scoping bug: sidecar_status was only defined inside the else branch of the circuit breaker check. Initialized to None at function scope alongside target_url and error to prevent NameError when circuit is open.	2026-06-16 21:25:42 +00:00
root	f2e62f60e6	fix: /api/show GET support, /v1 root handler, and proxy debug logging Three changes to debug and fix Hermes Desktop integration: 1. /api/show: Added GET handler alongside existing POST handler. Hermes Desktop probes with GET ?model=xxx, not POST body. Refactored shared lookup logic into _ollama_show_lookup(). 2. /v1 root: Added handler returning basic info. Hermes Desktop probes this URL and ERR_CONNECTION_REFUSED was blocking full provider validation. 3. Proxy execute(): Added debug logging for non-200 responses. Prints the backend URL, status code, and first 500 bytes of body to help diagnose why llama-server returns 400 on /v1/chat/completions.	2026-06-16 21:16:45 +00:00
root	d935339280	fix: report actual profile context size in /api/show probe endpoint Hermes Desktop reads the context size from /api/show's 'parameters' field. This was hardcoded to 'num_ctx 4096' for every model, causing 'context too small' errors when the user's system prompt + conversation exceeded 4K tokens. Now extracts the actual ctx-size from the profile's flags and returns the correct value (e.g. 'num_ctx 131072' for the 128K profiles).	2026-06-16 21:04:40 +00:00
root	2c23faa4a1	fix: add probe endpoints and no-model fallback for Hermes Desktop compatibility Hermes Desktop sends probe requests to validate providers before allowing model switching. The router was returning 503 for all of these because the catch-all proxy requires a 'model' field in the request body. Added explicit handlers for: - GET /v1/models/{model_id} — OpenAI single-model lookup - GET /api/tags — Ollama model list discovery - POST /api/show — Ollama model info - GET /api/v1/models — Ollama-compatible model list - GET /v1/props, GET /props — llama.cpp server properties - GET /version — llama.cpp version Also fixed the catch-all proxy to route requests with no model body to the currently active backend instead of returning 503.	2026-06-15 15:22:15 +00:00
root	45417068ae	fix: change sidecar port from 8081 to 8080 The sidecar is deployed on port 8080 instead of 8081. Update all: - Default SIDECAR_PORT in sidecar/app.py - Default SIDECAR_URL in main.py (router) - deploy/llm-sidecar.service Environment - deploy/README.md (.env example + config table) - All 7 test files (conftest, circuit-breaker, fallback, queue, model-detection, sse-progress, v1-models)	2026-06-15 13:17:31 +00:00
root	4914363089	Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing	2026-06-15 01:13:36 +00:00
root	c491779248	Epic: Model Switching via Sidecar — Issues #2-#3 Issue #2: Manifest schema + Sidecar foundation - sidecar/manifest.py: YAML manifest loading and profile validation - sidecar/app.py: FastAPI sidecar service with /models/available, /models/status endpoints - Router GET /v1/models: proxies to sidecar, returns OpenAI-compatible model list - Tests: 12 manifest tests, 6 sidecar endpoint tests, 3 router tests (21 total) Issue #3: Sidecar model switch + Router request queue - Sidecar POST /models/switch: stops current llama-server, starts new one, polls for readiness - Switch lock prevents concurrent switches (threading.Lock for TestClient compatibility) - Router request queue: max 10 requests, 120s hard timeout, 429 when full - Router automatic model detection: extracts model from chat body, matches against sidecar status - Full proxy endpoint with Sidecar → Main PC routing and fallback chain - Tests: 5 sidecar switch tests, 4 queue tests, 3 router integration tests (12 total) Total: 33 tests, all passing	2026-06-15 00:49:24 +00:00
Tudorel Oprisan	712fe041b1	test	2026-06-09 19:54:03 +01:00
Tudorel Oprisan	1a7dd550ec	added debug	2026-06-09 18:05:10 +01:00
Tudorel Oprisan	d7090b1644	Fix build context, port conflict, and improve proxy/health-check logic	2026-06-09 17:34:07 +01:00
Tudorel Oprisan	0e05390be2	Initial commit: migrate intelligence-router files	2026-06-09 11:48:43 +01:00

15 Commits