2026-06-15 04:13:36 +03:00
|
|
|
# LLM Sidecar — Deployment Guide
|
|
|
|
|
|
|
|
|
|
## Quick Install
|
|
|
|
|
|
|
|
|
|
On the Main PC:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# 1. Copy the service file
|
|
|
|
|
sudo cp deploy/llm-sidecar.service /etc/systemd/system/
|
|
|
|
|
|
2026-06-15 16:02:34 +03:00
|
|
|
# 2. Create the working directory and copy files
|
2026-06-15 04:13:36 +03:00
|
|
|
mkdir -p /home/bigt/AI/llm
|
|
|
|
|
cp deploy/manifest.yaml /home/bigt/AI/llm/manifest.yaml
|
|
|
|
|
|
2026-06-15 16:02:34 +03:00
|
|
|
# Copy the sidecar Python package (app.py + manifest.py)
|
|
|
|
|
cp -r sidecar/ /home/bigt/AI/llm/sidecar/
|
|
|
|
|
|
|
|
|
|
# Copy requirements.txt for the venv
|
|
|
|
|
cp requirements.txt /home/bigt/AI/llm/
|
|
|
|
|
|
|
|
|
|
# 3. Create a Python virtual environment with dependencies
|
|
|
|
|
python3 -m venv /home/bigt/AI/llm/venv
|
|
|
|
|
/home/bigt/AI/llm/venv/bin/pip install -r /home/bigt/AI/llm/requirements.txt
|
|
|
|
|
|
|
|
|
|
# 4. Create a .env for the sidecar (optional)
|
2026-06-15 04:13:36 +03:00
|
|
|
cat > /home/bigt/AI/llm/.env << 'EOF'
|
|
|
|
|
# Sidecar configuration
|
|
|
|
|
MANIFEST_PATH=/home/bigt/AI/llm/manifest.yaml
|
|
|
|
|
SIDECAR_PORT=8081
|
|
|
|
|
EOF
|
|
|
|
|
|
2026-06-15 16:02:34 +03:00
|
|
|
# 5. Enable and start the service
|
2026-06-15 04:13:36 +03:00
|
|
|
sudo systemctl daemon-reload
|
|
|
|
|
sudo systemctl enable --now llm-sidecar
|
|
|
|
|
|
2026-06-15 16:02:34 +03:00
|
|
|
# 6. Verify it's running
|
2026-06-15 04:13:36 +03:00
|
|
|
sudo systemctl status llm-sidecar
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Verify
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Check sidecar is responding
|
|
|
|
|
curl http://10.0.4.11:8081/models/available
|
|
|
|
|
|
|
|
|
|
# Check model status
|
|
|
|
|
curl http://10.0.4.11:8081/models/status
|
|
|
|
|
|
|
|
|
|
# Test the router
|
|
|
|
|
curl http://10.0.4.100:9001/v1/models
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Configuration
|
|
|
|
|
|
|
|
|
|
### Environment Variables
|
|
|
|
|
|
|
|
|
|
| Variable | Default | Description |
|
|
|
|
|
|----------|---------|-------------|
|
|
|
|
|
| `MANIFEST_PATH` | `/home/bigt/AI/llm/manifest.yaml` | Path to the YAML manifest file |
|
|
|
|
|
| `SIDECAR_PORT` | `8081` | Port the sidecar listens on |
|
|
|
|
|
|
|
|
|
|
### Manifest Format
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
- id: model-id
|
|
|
|
|
name: "Display Name"
|
|
|
|
|
model_path: "/path/to/model.gguf"
|
|
|
|
|
flags: # Arbitrary llama-server flags
|
|
|
|
|
n_ctx: 8192
|
|
|
|
|
n_gpu_layers: 35
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
- `id`: Unique identifier used in `model` field of chat completions
|
|
|
|
|
- `name`: Human-readable display name
|
|
|
|
|
- `model_path`: Absolute path to the GGUF file
|
|
|
|
|
- `flags`: Any llama-server CLI flags (n_ctx, n_gpu_layers, etc.)
|
|
|
|
|
|
|
|
|
|
## Managing the Service
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Start/Stop/Restart
|
|
|
|
|
sudo systemctl start llm-sidecar
|
|
|
|
|
sudo systemctl stop llm-sidecar
|
|
|
|
|
sudo systemctl restart llm-sidecar
|
|
|
|
|
|
|
|
|
|
# View logs
|
|
|
|
|
sudo journalctl -u llm-sidecar -f
|
|
|
|
|
|
|
|
|
|
# Check status
|
|
|
|
|
sudo systemctl status llm-sidecar
|
|
|
|
|
|
|
|
|
|
# Disable auto-start
|
|
|
|
|
sudo systemctl disable llm-sidecar
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Troubleshooting
|
|
|
|
|
|
|
|
|
|
- **Sidecar not starting**: Check `sudo journalctl -u llm-sidecar -n 50`
|
|
|
|
|
- **Manifest errors**: Check that YAML is valid (`python3 -c "import yaml; yaml.safe_load(open('manifest.yaml'))"`)
|
|
|
|
|
- **llama-server crashes**: Sidecar auto-restarts it up to 3 times before the circuit breaker opens
|
|
|
|
|
- **Port conflict**: Change `SIDECAR_PORT` in the service environment
|