2. Sidecar model switch + Router request queue #3

New Issue

doru · 2026-06-15T02:48:58+03:00

doru commented

2026-06-15 02:48:58 +03:00

Parent

#1 Epic: Model Switching via Sidecar

What to build

Add the model switch capability to the Sidecar and the request queue to the Router. This is the core end-to-end slice: a request triggers a switch, subsequent requests queue, and the queue drains when the model is ready.

Sidecar POST /models/switch — body {profile_id}. Stops current llama-server subprocess, starts new one with the profile's model_path and flags, polls localhost:8080/v1/models every 500ms for readiness, returns {status: "ready", active_profile} or {status: "error", message}. In-memory switch lock prevents concurrent switches.
Sidecar GET /models/status — updated to return {active_profile: Profile | null, llama_server_running: bool} based on actual subprocess state.
Router — adds request queue (max 10, 120s hard timeout). When proxy detects a switch is in progress, queues the request. Returns 429 when queue is full. Drains queue once Sidecar reports ready.

Acceptance criteria

POST /models/switch stops current llama-server and starts new one with profile flags
Switch readiness detection via polling localhost:8080/v1/models every 500ms
Switch lock prevents concurrent switches (second POST returns 409 or similar)
Router queues requests during switch (max 10)
Queued requests time out after 120s
Router returns 429 when queue is full
Queue drains when Sidecar reports ready
Tests: switch to new profile, switch when already on same profile, readiness detection, queue cap, queue timeout, 429 beyond capacity

Blocked by

#2 (Manifest schema + Sidecar foundation)

User stories covered

2, 3, 11, 12, 13, 16

## Parent - #1 Epic: Model Switching via Sidecar ## What to build Add the model switch capability to the Sidecar and the request queue to the Router. This is the core end-to-end slice: a request triggers a switch, subsequent requests queue, and the queue drains when the model is ready. - **Sidecar** `POST /models/switch` — body `{profile_id}`. Stops current llama-server subprocess, starts new one with the profile's `model_path` and `flags`, polls `localhost:8080/v1/models` every 500ms for readiness, returns `{status: "ready", active_profile}` or `{status: "error", message}`. In-memory switch lock prevents concurrent switches. - **Sidecar** `GET /models/status` — updated to return `{active_profile: Profile | null, llama_server_running: bool}` based on actual subprocess state. - **Router** — adds request queue (max 10, 120s hard timeout). When proxy detects a switch is in progress, queues the request. Returns `429` when queue is full. Drains queue once Sidecar reports ready. ## Acceptance criteria - [ ] `POST /models/switch` stops current llama-server and starts new one with profile flags - [ ] Switch readiness detection via polling `localhost:8080/v1/models` every 500ms - [ ] Switch lock prevents concurrent switches (second POST returns 409 or similar) - [ ] Router queues requests during switch (max 10) - [ ] Queued requests time out after 120s - [ ] Router returns 429 when queue is full - [ ] Queue drains when Sidecar reports ready - [ ] Tests: switch to new profile, switch when already on same profile, readiness detection, queue cap, queue timeout, 429 beyond capacity ## Blocked by - #2 (Manifest schema + Sidecar foundation) ## User stories covered 2, 3, 11, 12, 13, 16

doru added the

type:afk

triage:ready

labels 2026-06-15 02:48:58 +03:00

doru referenced this issue

2026-06-15 02:48:58 +03:00

3. Automatic model detection and switch #4

doru referenced this issue from a commit

2026-06-15 03:49:31 +03:00

Epic: Model Switching via Sidecar — Issues #2-#3

Sign in to join this conversation.