intelligence-router

Author	SHA1	Message	Date
root	1551c281c2	fix: move llama-server stderr log from /tmp to working dir (ReadWritePaths compat) The sidecar systemd service has ProtectSystem=strict and ReadWritePaths=/home/bigt/AI/llm, making /tmp read-only. Writing /tmp/llama-server-stderr.log failed with EROFS. Changed LLAMA_STDERR_LOG to os.path.join(dirname(MANIFEST_PATH), ...), resolving to /home/bigt/AI/llm/llama-server-stderr.log, which is within the allowed ReadWritePaths.	2026-06-16 20:36:10 +00:00
root	37fee5341e	fix: capture llama-server stderr, fix YAML boolean flag conversion, reduce polling timeout Three fixes for the model-not-loading bug: 1. YAML boolean → CLI flag bug: YAML parses 'on'/'off'/'yes'/'no' as Python bools. str(True)='True' which is INVALID for llama.cpp's --flash-attn flag (expects 'on'/'off'/'auto'). Added _flag_value() converter that maps bools to 'on'/'off' strings. 2. llama-server stderr was DEVNULL: All error messages (bad model path, OOM, invalid flag) were invisible. Now captured to /tmp/llama-server-stderr.log and dumped to the sidecar log on failure. 3. Reduce polling timeout: 240 retries × 0.5s = 120s hang. Reduced to 60 retries × 0.5s = 30s. Still dumps stderr + exit code on failure. 4. Manifest VRAM fix: gemma4-26b-compact-long-128k used q8_0 KV cache at 128K context (~24GB on 24GB RTX 3090 — borderline OOM). Changed to q4_0 (~18GB, comfortable).	2026-06-16 00:06:45 +00:00
root	903f06c634	feat: add sync_models.py script to auto-update Hermes custom_providers from router model list	2026-06-15 21:10:36 +00:00
root	95c87a764b	fix: remove non-existent models from manifest (qwen-3-8b, llama-4-maverick), add 3 newly discovered models	2026-06-15 16:38:17 +00:00
root	36abbf573e	fix: unbuffer sidecar stdout so logs appear in journalctl	2026-06-15 16:25:58 +00:00
Tudorel Oprisan	1e9305395e	Fixed llama-server path	2026-06-15 17:01:53 +01:00
root	7e86a30bd8	fix: resolve port conflict between sidecar and llama-server Sidecar and llama-server were both configured on port 8080, causing llama-server to fail on startup (port already in use). - sidecar/app.py: LLAMA_SERVER_PORT → 8081 (sidecar stays on 8080) - docker-compose.yml: MAIN_PC_URL → port 8081 (router sends chat requests to llama-server, not the sidecar)	2026-06-15 15:31:31 +00:00
root	2c23faa4a1	fix: add probe endpoints and no-model fallback for Hermes Desktop compatibility Hermes Desktop sends probe requests to validate providers before allowing model switching. The router was returning 503 for all of these because the catch-all proxy requires a 'model' field in the request body. Added explicit handlers for: - GET /v1/models/{model_id} — OpenAI single-model lookup - GET /api/tags — Ollama model list discovery - POST /api/show — Ollama model info - GET /api/v1/models — Ollama-compatible model list - GET /v1/props, GET /props — llama.cpp server properties - GET /version — llama.cpp version Also fixed the catch-all proxy to route requests with no model body to the currently active backend instead of returning 503.	2026-06-15 15:22:15 +00:00
Tudorel Oprisan	af12370632	changed llama-server location	2026-06-15 16:10:49 +01:00
root	1ef8a497f6	fix: update docker-compose.yml SIDECAR_URL to port 8080	2026-06-15 13:23:09 +00:00
root	45417068ae	fix: change sidecar port from 8081 to 8080 The sidecar is deployed on port 8080 instead of 8081. Update all: - Default SIDECAR_PORT in sidecar/app.py - Default SIDECAR_URL in main.py (router) - deploy/llm-sidecar.service Environment - deploy/README.md (.env example + config table) - All 7 test files (conftest, circuit-breaker, fallback, queue, model-detection, sse-progress, v1-models)	2026-06-15 13:17:31 +00:00
Tudorel Oprisan	b7079fa199	fixed port and conflict	2026-06-15 14:07:18 +01:00
root	e14d2c62da	fix: use venv for sidecar deps, add missing deploy steps - llm-sidecar.service: use /home/bigt/AI/llm/venv/bin/uvicorn instead of global python3 -m uvicorn (avoids 'No module named uvicorn' error) - deploy/README.md: add steps to copy sidecar/ package, create venv, and pip install requirements.txt	2026-06-15 13:02:34 +00:00
Tudorel Oprisan	555a887b4e	fixed port	2026-06-15 13:43:43 +01:00
doru	39a8f09232	Merge pull request 'feat: add 15 model profiles to manifest.yaml' (#18 ) from feature/add-model-profiles into master Reviewed-on: https://ghituai.chiabur.xyz/doru/intelligence-router/pulls/18	2026-06-15 15:40:48 +03:00
root	e9790c00dc	feat: add 15 model profiles to manifest.yaml - Qwen3.6-27B: 3 profiles (balanced/thinking/extended) - Gemma 4 12B: 4 profiles (Q6_K_XL and IQ4_XS variants) - Gemma 4 26B-A4B: 3 profiles (Q4_K_M and IQ4_XS) - Qwen3.6-35B-A3B: 3 profiles (fast/thinking/extended, non-MTP) - Uncensored: 3 profiles (HauhauCS, Genesis APEX) - Add pytest.ini for test discovery - All profiles use KV cache quantization (q8_0/q4_0) for 64K-128K context - Embedded sampling parameters per model family - Based on research from r/LocalLLaMA, Unsloth benchmarks, HF model cards	2026-06-15 12:34:46 +00:00
root	4914363089	Epic: Model Switching via Sidecar — Issues #4-#7 + #8 deployment Issue #4: Automatic model detection and switch - Router extracts model from chat body, queries sidecar, triggers switch on mismatch - Matching active model routes directly to Main PC - No active model triggers cold start switch - Tests: 4 test_router_model_detection.py Issue #5: SSE switch progress feedback - _sse_format() correctly serializes SSE events - sse_progress_stream() generates phase progression events - Proxy yields SSE events then actual response - Tests: 3 test_router_sse_progress.py Issue #6: Circuit breaker + OpenRouter fallback - Circuit tracks Sidecar failures, opens after MAX_RECOVERY_ATTEMPTS (3) - OpenRouter API key from env, no longer uses x-intelligence-level header - Fixes: OPENROUTER_BASE, SSE format, circuit state isolation - Tests: 7 test_router_circuit_breaker.py Issue #7: LXC fallback chain completion - Full fallback: Main PC → OpenRouter → LXC - Each backend health-checked via /v1/models before routing - All backends down → 503 response - Fixed: execute() wrapped in try/except to trigger fallback chain - Tests: 3 test_router_fallback_lxc.py Issue #8: Systemd service deployment - deploy/llm-sidecar.service: systemd unit with Restart=always - deploy/manifest.yaml: example manifest with 3 profiles - deploy/README.md: deployment instructions - Updated: docker-compose.yml, requirements.txt, Dockerfile Test framework improvements: - tests/conftest.py: shared URL patches for all router tests - Fixed global state pollution in circuit breaker tests - Fixed test sidecar switch test (AsyncMock for async function) Total: 42 tests passing	2026-06-15 01:13:36 +00:00
root	c491779248	Epic: Model Switching via Sidecar — Issues #2-#3 Issue #2: Manifest schema + Sidecar foundation - sidecar/manifest.py: YAML manifest loading and profile validation - sidecar/app.py: FastAPI sidecar service with /models/available, /models/status endpoints - Router GET /v1/models: proxies to sidecar, returns OpenAI-compatible model list - Tests: 12 manifest tests, 6 sidecar endpoint tests, 3 router tests (21 total) Issue #3: Sidecar model switch + Router request queue - Sidecar POST /models/switch: stops current llama-server, starts new one, polls for readiness - Switch lock prevents concurrent switches (threading.Lock for TestClient compatibility) - Router request queue: max 10 requests, 120s hard timeout, 429 when full - Router automatic model detection: extracts model from chat body, matches against sidecar status - Full proxy endpoint with Sidecar → Main PC routing and fallback chain - Tests: 5 sidecar switch tests, 4 queue tests, 3 router integration tests (12 total) Total: 33 tests, all passing	2026-06-15 00:49:24 +00:00
root	b2031d8b7a	Added next changes	2026-06-15 00:09:31 +00:00
Tudorel Oprisan	712fe041b1	test	2026-06-09 19:54:03 +01:00
Tudorel Oprisan	1a7dd550ec	added debug	2026-06-09 18:05:10 +01:00
Tudorel Oprisan	d7090b1644	Fix build context, port conflict, and improve proxy/health-check logic	2026-06-09 17:34:07 +01:00
Tudorel Oprisan	cb01b42f38	Cleanup: Remove redundant llama-slm service and use LXC IP	2026-06-09 12:41:32 +01:00
Tudorel Oprisan	4ea94f7d60	Update IPs for Main PC and LXC Fallback Brain	2026-06-09 12:37:34 +01:00
Chiabur Aiode	8fab2f3801	.env	2026-06-09 13:57:22 +03:00
Chiabur Aiode	038e8f9f7c	gitignore	2026-06-09 13:54:18 +03:00
Tudorel Oprisan	0e05390be2	Initial commit: migrate intelligence-router files	2026-06-09 11:48:43 +01:00

27 Commits