Skip to content

LLM Providers & Serve Gateway

Rnix supports multiple LLM providers through declarative configuration and exposes them as an OpenAI-compatible HTTP API gateway.


Multi-Provider Configuration

providers.yaml

Define LLM providers declaratively in ~/.config/rnix/providers.yaml (global) or .rnix/providers.yaml (project override). The daemon parses this at startup and registers each as a VFS device at /dev/llm/<name>.

yaml
version: "1"
default_provider: claude

providers:
  - name: claude
    driver: claude-cli
    default_model: haiku

  - name: cursor
    driver: cursor-cli
    command: agent              # CLI binary name (default: "agent")

  - name: ollama
    driver: openai-compat
    base_url: http://localhost:11434/v1
    default_model: llama3

  - name: groq
    driver: openai-compat
    base_url: https://api.groq.com/openai/v1
    api_key_env: GROQ_API_KEY
    default_model: llama-3.3-70b-versatile

  - name: deepseek
    driver: openai-compat
    base_url: https://api.deepseek.com/v1
    api_key_env: DEEPSEEK_API_KEY
    default_model: deepseek-chat

  - name: gemini
    driver: gemini
    api_key_env: GOOGLE_API_KEY
    default_model: gemini-2.0-flash

  - name: openai
    driver: openai
    api_key_env: OPENAI_API_KEY
    default_model: gpt-4o

  - name: anthropic-api
    driver: anthropic
    api_key_env: ANTHROPIC_API_KEY
    default_model: claude-sonnet-4-20250514

  - name: qwen
    driver: qwen-cli
    default_model: qwen3-coder

Driver Types

DriverHow It WorksExamples
claude-cliInvokes Claude Code CLI (claude -p)Anthropic Claude
cursor-cliInvokes Cursor CLI (agent --print)Cursor
openai-compatCalls OpenAI-compatible HTTP API endpointOllama, Groq, DeepSeek, any OpenAI-compatible server
qwen-cliInvokes Qwen Code CLI (qwen --chat)Qwen Code
openaiOfficial OpenAI SDK (github.com/openai/openai-go/v3)OpenAI GPT-4, GPT-4o
geminiNative Gemini API (google.golang.org/genai)Google Gemini
anthropicOfficial Anthropic SDK (anthropic-sdk-go)Claude (via API, not CLI)

CLI Command Alias

CLI drivers invoke a binary to interact with the LLM. Use the command field to override the default binary name:

DriverDefault Command
claude-cliclaude
cursor-cliagent
qwen-cliqwen
yaml
- name: cursor
  driver: cursor-cli
  command: cursor-agent   # Override default "agent"

Provider Resolution

When spawning an agent, the provider is resolved:

  1. --provider CLI flag (highest priority)
  2. agent.yamlmodels.provider
  3. providers.yamldefault_provider
  4. Built-in default: claude

Provider Fallback

When the preferred provider fails (HTTP 5xx, connection timeout, auth failure), the system automatically tries the fallback:

yaml
# agent.yaml
models:
  provider: groq          # Primary
  preferred: llama-3.3-70b
  fallback: ollama         # Fallback provider

Health Check

bash
$ rnix providers status
Provider     Driver  Status    Model              Latency
claude       cli     healthy   sonnet             -
cursor       cli     healthy   claude-3.5-sonnet  -
ollama       http    healthy   llama3             45ms
groq         http    healthy   llama-3.3-70b      120ms
deepseek     http    offline   deepseek-chat      timeout

Advanced Provider Options

FieldTypeDefaultDescription
modestring"stream"Response mode: "stream" for SSE streaming, "call" for single-shot response
max_tokensint0Maximum output tokens per LLM call; 0 uses the API default
cost_per_tokenfloat640Per-token cost in USD for budget tracking; 0 disables cost tracking
thinking_budgetint0Thinking budget in tokens (gemini driver only); 0 disables thinking
extra_argsstring[][]Additional CLI arguments passed to the binary (claude-cli, cursor-cli, qwen-cli only)

Example — Gemini with thinking budget:

yaml
- name: gemini-thinking
  driver: gemini
  api_key_env: GOOGLE_API_KEY
  default_model: gemini-2.5-pro
  thinking_budget: 8192

API Key Management

HTTP providers reference API keys via environment variables — keys are never stored in config files:

yaml
- name: groq
  driver: openai-compat
  api_key_env: GROQ_API_KEY   # Reads $GROQ_API_KEY at runtime

API keys are resolved in this order:

  1. Project .env files — loaded from the project root when .rnix/ exists (.env.env.local.env.{RNIX_ENV}.env.{RNIX_ENV}.local)
  2. Daemon process environmentos.Getenv fallback

This means you can define API keys per-project without polluting the daemon's global environment. See Configuration > Environment Files for .env syntax and loading order.

Project-Level Provider Overrides

A project can override or extend global providers by placing a providers.yaml in .rnix/:

myproject/.rnix/providers.yaml

Project providers are deep-merged with global providers — you can override specific fields (like api_key_env or default_model) without redefining the entire provider list. Project-level providers that don't exist globally are added as new providers available only in that project.


LLM Serve Gateway

Overview

rnix serve starts an OpenAI-compatible HTTP server that exposes all registered providers as standard API endpoints. External tools (VS Code extensions, web UIs, other applications) can consume LLM capabilities without understanding Rnix internals.

bash
$ rnix serve --port 8080
[serve] starting OpenAI-compatible API server on http://localhost:8080
[serve] registered providers: claude, cursor, ollama, groq
[serve] endpoints: /v1/chat/completions, /v1/models, /health

Flags:

FlagDefaultDescription
--port8080HTTP listen port

The server binds to 127.0.0.1 (localhost only). Request body size is limited to 4 MB. On startup, the server runs health checks on all providers (3s timeout per provider).

Endpoints

POST /v1/chat/completions

Standard OpenAI Chat Completions API. The model parameter routes to the corresponding VFS LLM driver:

Request body:

FieldTypeRequiredDescription
modelstringYesModel identifier (provider name or provider:model)
messagesarrayYesMessage array ({role, content}), at least 1 message
streamboolNoEnable SSE streaming response
temperaturefloat64NoSampling temperature
max_tokensintNoMaximum tokens to generate
bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Model routing:

Model StringResolution
"ollama"/dev/llm/ollama → uses provider's default_model
"groq:llama-3.3-70b"/dev/llm/groq with explicit model
"llama-3.3-70b"Reverse lookup: finds a provider whose default_model matches
"unknown-model"Falls back to default_provider, input treated as model name

If no provider is found, returns 404 with available provider list.

Streaming — set "stream": true for SSE responses:

bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "ollama", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

Stream terminates with data: [DONE]\n\n per OpenAI SSE spec.

Response format (non-streaming):

json
{
  "id": "chatcmpl-1234567890",
  "object": "chat.completion",
  "created": 1711600000,
  "model": "ollama",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "Hello! How can I help?"},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 25,
    "total_tokens": 35
  }
}

GET /v1/models

Lists all registered providers with available models. Unhealthy providers are excluded; unchecked providers are included. Each provider with a default_model generates two entries:

json
{
  "object": "list",
  "data": [
    {"id": "claude", "object": "model", "created": 1711600000, "owned_by": "claude"},
    {"id": "claude:haiku", "object": "model", "created": 1711600000, "owned_by": "claude"},
    {"id": "ollama", "object": "model", "created": 1711600000, "owned_by": "ollama"},
    {"id": "ollama:llama3", "object": "model", "created": 1711600000, "owned_by": "ollama"}
  ]
}

GET /health

Health check endpoint for monitoring and load balancers:

json
{"status": "ok", "providers": 4}

Error Responses

All errors follow the OpenAI error format:

json
{
  "error": {
    "message": "Provider 'xyz' not found. Available providers: claude, cursor",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}
HTTP StatusCodeScenario
400invalid_requestInvalid JSON, missing model or messages
404model_not_foundProvider not found
502upstream_errorLLM driver returned an error or empty response
504timeoutLLM request timed out

Architecture

The serve gateway shares the daemon's driver instances and providers.yaml configuration. Adding or changing a provider only requires editing the config and restarting the daemon.

External Tool → HTTP → rnix serve → VFS /dev/llm/* → Provider Driver → LLM

Released under the MIT License.