LLM Providers & Serve Gateway
Rnix supports multiple LLM providers through declarative configuration and exposes them as an OpenAI-compatible HTTP API gateway.
Multi-Provider Configuration
providers.yaml
Define LLM providers declaratively in ~/.config/rnix/providers.yaml (global) or .rnix/providers.yaml (project override). The daemon parses this at startup and registers each as a VFS device at /dev/llm/<name>.
version: "1"
default_provider: claude
providers:
- name: claude
driver: claude-cli
default_model: haiku
- name: cursor
driver: cursor-cli
command: agent # CLI binary name (default: "agent")
- name: ollama
driver: openai-compat
base_url: http://localhost:11434/v1
default_model: llama3
- name: groq
driver: openai-compat
base_url: https://api.groq.com/openai/v1
api_key_env: GROQ_API_KEY
default_model: llama-3.3-70b-versatile
- name: deepseek
driver: openai-compat
base_url: https://api.deepseek.com/v1
api_key_env: DEEPSEEK_API_KEY
default_model: deepseek-chat
- name: gemini
driver: gemini
api_key_env: GOOGLE_API_KEY
default_model: gemini-2.0-flash
- name: openai
driver: openai
api_key_env: OPENAI_API_KEY
default_model: gpt-4o
- name: anthropic-api
driver: anthropic
api_key_env: ANTHROPIC_API_KEY
default_model: claude-sonnet-4-20250514
- name: qwen
driver: qwen-cli
default_model: qwen3-coderDriver Types
| Driver | How It Works | Examples |
|---|---|---|
claude-cli | Invokes Claude Code CLI (claude -p) | Anthropic Claude |
cursor-cli | Invokes Cursor CLI (agent --print) | Cursor |
openai-compat | Calls OpenAI-compatible HTTP API endpoint | Ollama, Groq, DeepSeek, any OpenAI-compatible server |
qwen-cli | Invokes Qwen Code CLI (qwen --chat) | Qwen Code |
openai | Official OpenAI SDK (github.com/openai/openai-go/v3) | OpenAI GPT-4, GPT-4o |
gemini | Native Gemini API (google.golang.org/genai) | Google Gemini |
anthropic | Official Anthropic SDK (anthropic-sdk-go) | Claude (via API, not CLI) |
CLI Command Alias
CLI drivers invoke a binary to interact with the LLM. Use the command field to override the default binary name:
| Driver | Default Command |
|---|---|
claude-cli | claude |
cursor-cli | agent |
qwen-cli | qwen |
- name: cursor
driver: cursor-cli
command: cursor-agent # Override default "agent"Provider Resolution
When spawning an agent, the provider is resolved:
--providerCLI flag (highest priority)agent.yaml→models.providerproviders.yaml→default_provider- Built-in default:
claude
Provider Fallback
When the preferred provider fails (HTTP 5xx, connection timeout, auth failure), the system automatically tries the fallback:
# agent.yaml
models:
provider: groq # Primary
preferred: llama-3.3-70b
fallback: ollama # Fallback providerHealth Check
$ rnix providers status
Provider Driver Status Model Latency
claude cli healthy sonnet -
cursor cli healthy claude-3.5-sonnet -
ollama http healthy llama3 45ms
groq http healthy llama-3.3-70b 120ms
deepseek http offline deepseek-chat timeoutAdvanced Provider Options
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | "stream" | Response mode: "stream" for SSE streaming, "call" for single-shot response |
max_tokens | int | 0 | Maximum output tokens per LLM call; 0 uses the API default |
cost_per_token | float64 | 0 | Per-token cost in USD for budget tracking; 0 disables cost tracking |
thinking_budget | int | 0 | Thinking budget in tokens (gemini driver only); 0 disables thinking |
extra_args | string[] | [] | Additional CLI arguments passed to the binary (claude-cli, cursor-cli, qwen-cli only) |
Example — Gemini with thinking budget:
- name: gemini-thinking
driver: gemini
api_key_env: GOOGLE_API_KEY
default_model: gemini-2.5-pro
thinking_budget: 8192API Key Management
HTTP providers reference API keys via environment variables — keys are never stored in config files:
- name: groq
driver: openai-compat
api_key_env: GROQ_API_KEY # Reads $GROQ_API_KEY at runtimeAPI keys are resolved in this order:
- Project
.envfiles — loaded from the project root when.rnix/exists (.env→.env.local→.env.{RNIX_ENV}→.env.{RNIX_ENV}.local) - Daemon process environment —
os.Getenvfallback
This means you can define API keys per-project without polluting the daemon's global environment. See Configuration > Environment Files for .env syntax and loading order.
Project-Level Provider Overrides
A project can override or extend global providers by placing a providers.yaml in .rnix/:
myproject/.rnix/providers.yamlProject providers are deep-merged with global providers — you can override specific fields (like api_key_env or default_model) without redefining the entire provider list. Project-level providers that don't exist globally are added as new providers available only in that project.
LLM Serve Gateway
Overview
rnix serve starts an OpenAI-compatible HTTP server that exposes all registered providers as standard API endpoints. External tools (VS Code extensions, web UIs, other applications) can consume LLM capabilities without understanding Rnix internals.
$ rnix serve --port 8080
[serve] starting OpenAI-compatible API server on http://localhost:8080
[serve] registered providers: claude, cursor, ollama, groq
[serve] endpoints: /v1/chat/completions, /v1/models, /healthFlags:
| Flag | Default | Description |
|---|---|---|
--port | 8080 | HTTP listen port |
The server binds to 127.0.0.1 (localhost only). Request body size is limited to 4 MB. On startup, the server runs health checks on all providers (3s timeout per provider).
Endpoints
POST /v1/chat/completions
Standard OpenAI Chat Completions API. The model parameter routes to the corresponding VFS LLM driver:
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier (provider name or provider:model) |
messages | array | Yes | Message array ({role, content}), at least 1 message |
stream | bool | No | Enable SSE streaming response |
temperature | float64 | No | Sampling temperature |
max_tokens | int | No | Maximum tokens to generate |
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama",
"messages": [{"role": "user", "content": "Hello"}]
}'Model routing:
| Model String | Resolution |
|---|---|
"ollama" | /dev/llm/ollama → uses provider's default_model |
"groq:llama-3.3-70b" | /dev/llm/groq with explicit model |
"llama-3.3-70b" | Reverse lookup: finds a provider whose default_model matches |
"unknown-model" | Falls back to default_provider, input treated as model name |
If no provider is found, returns 404 with available provider list.
Streaming — set "stream": true for SSE responses:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "ollama", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'Stream terminates with data: [DONE]\n\n per OpenAI SSE spec.
Response format (non-streaming):
{
"id": "chatcmpl-1234567890",
"object": "chat.completion",
"created": 1711600000,
"model": "ollama",
"choices": [
{
"index": 0,
"message": {"role": "assistant", "content": "Hello! How can I help?"},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 25,
"total_tokens": 35
}
}GET /v1/models
Lists all registered providers with available models. Unhealthy providers are excluded; unchecked providers are included. Each provider with a default_model generates two entries:
{
"object": "list",
"data": [
{"id": "claude", "object": "model", "created": 1711600000, "owned_by": "claude"},
{"id": "claude:haiku", "object": "model", "created": 1711600000, "owned_by": "claude"},
{"id": "ollama", "object": "model", "created": 1711600000, "owned_by": "ollama"},
{"id": "ollama:llama3", "object": "model", "created": 1711600000, "owned_by": "ollama"}
]
}GET /health
Health check endpoint for monitoring and load balancers:
{"status": "ok", "providers": 4}Error Responses
All errors follow the OpenAI error format:
{
"error": {
"message": "Provider 'xyz' not found. Available providers: claude, cursor",
"type": "invalid_request_error",
"code": "model_not_found"
}
}| HTTP Status | Code | Scenario |
|---|---|---|
400 | invalid_request | Invalid JSON, missing model or messages |
404 | model_not_found | Provider not found |
502 | upstream_error | LLM driver returned an error or empty response |
504 | timeout | LLM request timed out |
Architecture
The serve gateway shares the daemon's driver instances and providers.yaml configuration. Adding or changing a provider only requires editing the config and restarting the daemon.
External Tool → HTTP → rnix serve → VFS /dev/llm/* → Provider Driver → LLMRelated Documentation
- Configuration — All configuration files
- Agents & Skills — Provider selection in agent manifests
- Reference Manual — VFS path specification for /dev/llm/*