2. Sidecar model switch + Router request queue #3

Open
opened 2026-06-15 02:48:58 +03:00 by doru · 0 comments
Owner

Parent

  • #1 Epic: Model Switching via Sidecar

What to build

Add the model switch capability to the Sidecar and the request queue to the Router. This is the core end-to-end slice: a request triggers a switch, subsequent requests queue, and the queue drains when the model is ready.

  • Sidecar POST /models/switch — body {profile_id}. Stops current llama-server subprocess, starts new one with the profile's model_path and flags, polls localhost:8080/v1/models every 500ms for readiness, returns {status: "ready", active_profile} or {status: "error", message}. In-memory switch lock prevents concurrent switches.
  • Sidecar GET /models/status — updated to return {active_profile: Profile | null, llama_server_running: bool} based on actual subprocess state.
  • Router — adds request queue (max 10, 120s hard timeout). When proxy detects a switch is in progress, queues the request. Returns 429 when queue is full. Drains queue once Sidecar reports ready.

Acceptance criteria

  • POST /models/switch stops current llama-server and starts new one with profile flags
  • Switch readiness detection via polling localhost:8080/v1/models every 500ms
  • Switch lock prevents concurrent switches (second POST returns 409 or similar)
  • Router queues requests during switch (max 10)
  • Queued requests time out after 120s
  • Router returns 429 when queue is full
  • Queue drains when Sidecar reports ready
  • Tests: switch to new profile, switch when already on same profile, readiness detection, queue cap, queue timeout, 429 beyond capacity

Blocked by

  • #2 (Manifest schema + Sidecar foundation)

User stories covered

2, 3, 11, 12, 13, 16

## Parent - #1 Epic: Model Switching via Sidecar ## What to build Add the model switch capability to the Sidecar and the request queue to the Router. This is the core end-to-end slice: a request triggers a switch, subsequent requests queue, and the queue drains when the model is ready. - **Sidecar** `POST /models/switch` — body `{profile_id}`. Stops current llama-server subprocess, starts new one with the profile's `model_path` and `flags`, polls `localhost:8080/v1/models` every 500ms for readiness, returns `{status: "ready", active_profile}` or `{status: "error", message}`. In-memory switch lock prevents concurrent switches. - **Sidecar** `GET /models/status` — updated to return `{active_profile: Profile | null, llama_server_running: bool}` based on actual subprocess state. - **Router** — adds request queue (max 10, 120s hard timeout). When proxy detects a switch is in progress, queues the request. Returns `429` when queue is full. Drains queue once Sidecar reports ready. ## Acceptance criteria - [ ] `POST /models/switch` stops current llama-server and starts new one with profile flags - [ ] Switch readiness detection via polling `localhost:8080/v1/models` every 500ms - [ ] Switch lock prevents concurrent switches (second POST returns 409 or similar) - [ ] Router queues requests during switch (max 10) - [ ] Queued requests time out after 120s - [ ] Router returns 429 when queue is full - [ ] Queue drains when Sidecar reports ready - [ ] Tests: switch to new profile, switch when already on same profile, readiness detection, queue cap, queue timeout, 429 beyond capacity ## Blocked by - #2 (Manifest schema + Sidecar foundation) ## User stories covered 2, 3, 11, 12, 13, 16
doru added the
type:afk
triage:ready
labels 2026-06-15 02:48:58 +03:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: doru/intelligence-router#3
No description provided.