Skip to content

Models API

List all available models across the network (local + QUIC peers + fleet).

{
"object": "list",
"data": [
{"id": "Qwen2.5-3B-Instruct-Q8_0", "object": "model", "owned_by": "local"},
{"id": "Mistral-Small-24B-Q4_K_M", "object": "model", "owned_by": "fleet:aurora"},
{"id": "relay:llama3.2:3b", "object": "model", "owned_by": "relay:ipad"}
]
}

Models prefixed with relay: are served by external devices connected via relay backends.

{
"model_path": "/path/to/model.gguf",
"name": "my-model",
"ctx_len": 4096
}
{
"name": "claude-sonnet",
"backend": "openai",
"api_base": "https://openrouter.ai/api/v1",
"api_key": "secret:openrouter",
"api_model": "anthropic/claude-sonnet-4",
"max_concurrent": 32
}

Use secret:name to reference encrypted secrets instead of raw API keys.

max_concurrent controls how many simultaneous requests this model can handle (default: 32 for API/relay, 1 for local GGUF). See relay docs for details.

{"model": "my-model"}

Models are classified by parameter count:

TierParametersExamples
Tier 1up to 8BQwen 7B, Llama 8B, Phi-4
Tier 2up to 70BMistral Small 24B, Llama 70B
Tier 370B+Qwen 72B, Llama 405B

Access is governed by credit balance — earned by contributing GPU time to the network:

TierCredit balanceAccess
FreeAnonymous / < 10Tier 1 models, 5K tokens/day
Contributor10+ creditsTier 1 + 2 models, 50K tokens/day
Power Seeder50+ creditsAll tiers, highest daily limits

Credits are earned by serving inference and spent by consuming it. The system is designed to be fair — contribute compute, get compute back.