The circuit breaker opened after MAX_RECOVERY_ATTEMPTS failures but was never reset because the sidecar status query (which calls circuit_reset()) was skipped when the circuit was open. This caused a permanent deadlock: all subsequent requests went to the LXC fallback with no recovery possible. Fix: always query the sidecar for /models/status, even when the circuit is open. If the sidecar responds successfully, reset the circuit. The circuit breaker now only prevents the SWITCH operation, not the status health check. If a model is already running when the circuit is open, route to it directly. |
||
|---|---|---|
| .hermes/plans | ||
| deploy | ||
| docs | ||
| scripts | ||
| sidecar | ||
| tests | ||
| .env | ||
| .gitignore | ||
| CONTEXT.md | ||
| docker-compose.yml | ||
| Dockerfile | ||
| main.py | ||
| pytest.ini | ||
| requirements.txt | ||