Creating deployments
Register provider deployments behind a public model.
A deployment binds a public name (what clients use), an adapter, an upstream model,
and a set of credentials. Several deployments with the same publicModel form a pool with load
balancing and fallback.
Everything is done through the admin API with the master key:
Authorization: Bearer $MASTER_KEY- Create:
POST /admin/deployments - Dry-run (validate without saving or encrypting credentials):
POST /admin/deployments/resolve - List / view / edit / delete:
GET|GET/:id|PATCH/:id|DELETE/:idon/admin/deployments - Inspect adapters and their operations:
GET /admin/operations
Body fields
| Field | Req. | Description |
|---|---|---|
publicModel | yes | The model's public name; it is what the client puts in "model". |
provider or adapterKey | yes | provider is a preset (resolves adapter + credentials + transportOverrides). Alternative: an explicit adapterKey. |
upstreamModel | yes | The real id at the provider (catalog key, or the deployment name on Azure). |
credentials | yes | { apiKey, baseUrl?, ... } depending on the preset (see table). Encrypted at rest. |
catalogEntry | custom only | Inline catalog entry for a model not present in the catalog. Uses the same shape as catalog.json; for catalog models it must be omitted. |
pricing | no | { inputCentsPerMTokens?, outputCentsPerMTokens?, cacheReadCentsPerMTokens? }. |
transportOverrides | no | Per-operation transport (override; the preset already ships defaults). |
enabled, weight, tpmLimit, rpmLimit | no | Deployment state and limits. |
Provider presets
provider | adapter | required credentials |
|---|---|---|
openai | openai | apiKey |
googleaistudio | googleaistudio | apiKey |
anthropic | anthropic | apiKey (version defaulted) |
azureopenai | azureopenai | apiKey, baseUrl |
azurefoundry | azurefoundry | apiKey, baseUrl |
openrouter | openaicompatible | apiKey (baseUrl defaulted) |
openaicompatible | openaicompatible | apiKey, baseUrl |
Catalog vs custom
- Catalog model (known in code):
upstreamModelmatches an entry in the adapter's catalog. Its capabilities (limits, reasoning, image/audio formats) come from the provider's JSON → do not sendcatalogEntry. - Custom model (not in the catalog, typical of
openaicompatible): you must declarecatalogEntrywithoperations. The presence of an operation implies the model supports it.
Tip: run
POST /admin/deployments/resolvewith the same body to seesource(catalog/custom), the resolved operations, and the transportOverrides before creating.
Examples by type
1) Text / chat — catalog model
curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
"publicModel": "gpt-5.4",
"provider": "openai",
"upstreamModel": "gpt-5.4",
"credentials": { "apiKey": "sk-..." }
}'The client then calls POST /v1/chat/completions (or /v1/responses, /v1/messages) with
"model": "gpt-5.4".
2) Image — catalog model
curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
"publicModel": "image-default",
"provider": "openai",
"upstreamModel": "gpt-image-1",
"credentials": { "apiKey": "sk-..." }
}'Client: POST /v1/images/generations or /v1/images/edits (multipart) with "model": "image-default".
3) Embeddings — catalog model
OpenAI:
curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
"publicModel": "embed",
"provider": "openai",
"upstreamModel": "text-embedding-3-small",
"credentials": { "apiKey": "sk-..." }
}'Google AI Studio:
curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
"publicModel": "embed",
"provider": "googleaistudio",
"upstreamModel": "gemini-embedding-001",
"credentials": { "apiKey": "AIza..." }
}'Azure OpenAI v1:
curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
"publicModel": "embed",
"provider": "azureopenai",
"upstreamModel": "text-embedding-3-small",
"credentials": {
"apiKey": "...",
"baseUrl": "https://my-resource.openai.azure.com"
}
}'Client:
curl -X POST $BASE/v1/embeddings -H "Authorization: Bearer $API_KEY" -H "content-type: application/json" -d '{
"model": "embed",
"input": ["red fox", "blue whale"],
"encoding_format": "float",
"dimensions": 768
}'The public contract is OpenAI-compatible. OpenAI, Azure OpenAI v1, and OpenAI-compatible use
/embeddings; Google AI Studio uses :embedContent for a single input and :batchEmbedContents for
a batch. Google accepts text and encoding_format: "float" in this gateway; pre-tokenized inputs and
base64 are rejected by profile.
4) Audio transcription — catalog model
curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
"publicModel": "transcribe",
"provider": "openai",
"upstreamModel": "gpt-4o-transcribe",
"credentials": { "apiKey": "sk-..." }
}'Client: POST /v1/audio/transcriptions (multipart, file field) with model=transcribe.
Special case: Azure OpenAI
Azure requires the classic deployment-based API for transcriptions (it does not exist on /openai/v1).
It is still created under azureopenai; upstreamModel is the deployment name and baseUrl the
resource endpoint. apiVersion is optional (default 2024-06-01; gpt-4o-transcribe may require a
more recent one):
curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
"publicModel": "transcribe",
"provider": "azureopenai",
"upstreamModel": "my-transcribe-deployment",
"credentials": {
"apiKey": "...",
"baseUrl": "https://my-resource.openai.azure.com",
"apiVersion": "2024-06-01"
}
}'Since they share
publicModel: "transcribe", direct OpenAI and Azure end up in the same pool.
Custom models (OpenAI-compatible)
When the upstreamModel is not in the catalog, declare catalogEntry. It is the same shape as an
entry inside models in any catalog.json; only include the operations the model supports.
Custom text
curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
"publicModel": "llama-local",
"provider": "openaicompatible",
"upstreamModel": "llama-3.3-70b",
"credentials": { "apiKey": "x", "baseUrl": "http://localhost:8000/v1" },
"catalogEntry": {
"operations": {
"text.generate": {
"capabilities": { "tools": true, "vision": false, "reasoning": false, "structuredOutputs": false },
"maxInputTokens": 131072,
"maxOutputTokens": 8192
}
}
}
}'Reasoning in custom models
A custom model can declare how it controls reasoning with a reasoning block inside
catalogEntry.operations.text.generate. For it to be valid, capabilities.reasoning must be
true; if it is false, the gateway rejects a reasoning block. The client still uses the canonical
knob (reasoning.effort); the gateway snaps it to levels and translates it to the upstream.
| Field | Description |
|---|---|
kind | openai_effort (emits reasoning_effort), openai_body (emits a provider-specific top-level field), or chat_template_flag (vLLM-style toggle in chat_template_kwargs). |
levels | The effort levels the model accepts (["low","medium","high"], or ["high"] for a binary toggle). |
canDisable | Whether reasoning can be turned off (none). |
upstreamEffortMap | openai_effort only: translates the canonical effort to the native reasoning_effort label. Keys = canonical efforts; they must be in levels. |
bodyField, effortField | openai_body only: bodyField defines the on/off top-level field; optional effortField defines where to write the scalar effort. |
chatTemplateFlag | chat_template_flag only: { param, onValue?, offValue? }. See below. |
a) Standard effort with native label (upstreamEffortMap): a provider that does use
reasoning_effort but with its own vocabulary (e.g. xhigh → "max"):
"reasoning": { "kind": "openai_effort", "levels": ["low","high","xhigh"], "canDisable": true, "upstreamEffortMap": { "xhigh": "max" } }b) vLLM-style toggle (chat_template_flag): thinking is enabled with a flag inside
chat_template_kwargs (kimi thinking, Qwen enable_thinking, etc.), NOT with reasoning_effort.
The canonical effort only decides on/off: active → onValue (default true), inactive → offValue
(if omitted, the parameter is not emitted and the template default wins).
curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
"publicModel": "kimi-k2.6",
"provider": "openaicompatible",
"upstreamModel": "moonshotai/kimi-k2.6",
"credentials": { "apiKey": "x", "baseUrl": "https://integrate.api.nvidia.com/v1" },
"catalogEntry": {
"operations": {
"text.generate": {
"capabilities": { "tools": true, "vision": true, "reasoning": true, "structuredOutputs": false },
"maxInputTokens": 262144,
"maxOutputTokens": 65536,
"reasoning": {
"kind": "chat_template_flag",
"levels": ["high"],
"canDisable": true,
"chatTemplateFlag": { "param": "thinking", "offValue": false }
}
}
}
}
}'Resulting public API: reasoning.effort ∈ none | high. With high the gateway emits
chat_template_kwargs: { "thinking": true }; with none (or when not requested) it emits
{ "thinking": false } (via offValue; if you omit it, the flag is not emitted and the template
default wins). The gateway wins over a chat_template_kwargs.thinking the client sends via
extra_body, but preserves the other keys the client puts there.
Vision: with
capabilities.vision: truethe client passes images in thecontentarray astype: "image_url"(URL or base64) and the adapter forwards them as-is. The canonical contract covers images/audio/files, notvideo_url; video would require extending the parts model.
Custom image
Image operations require outputFormats, responseFormats, and sizes (or arbitrarySize):
curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
"publicModel": "img-local",
"provider": "openaicompatible",
"upstreamModel": "sdxl",
"credentials": { "apiKey": "x", "baseUrl": "http://localhost:8000/v1" },
"catalogEntry": {
"operations": {
"image.generate": {
"maxN": 1,
"outputFormats": ["png"],
"responseFormats": ["b64_json"],
"sizes": { "1024x1024": {} }
}
}
}
}'Custom transcription
responseFormats is required; the rest is optional:
curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
"publicModel": "transcribe-local",
"provider": "openaicompatible",
"upstreamModel": "custom-transcribe",
"credentials": { "apiKey": "x", "baseUrl": "http://localhost:8000/v1" },
"catalogEntry": {
"operations": {
"audio.transcribe": {
"responseFormats": ["json", "text", "verbose_json"],
"supportsStreaming": false,
"supportsTimestampGranularities": true,
"maxFileBytes": 26214400
}
}
}
}'Custom embeddings
For OpenAI-compatible providers without a built-in catalog, declare embedding.create:
curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
"publicModel": "embed-local",
"provider": "openaicompatible",
"upstreamModel": "my-embedding-model",
"credentials": { "apiKey": "x", "baseUrl": "http://localhost:8000/v1" },
"catalogEntry": {
"operations": {
"embedding.create": {
"dimensions": 1024,
"supportsDimensions": false,
"encodingFormats": ["float"],
"maxInputTokens": 8192,
"supportsTokenInput": false
}
},
"pricing": { "inputCentsPerMTokens": 2 }
}
}'Notes
- The gateway validates each request against the operation's profile (formats, streaming, sizes,
limits). Requesting something outside the profile returns
400withcode: "unsupported_parameter". - For several providers behind the same public name, create several deployments with the same
publicModel(the router balances and applies fallback). - Edit with
PATCH /admin/deployments/:id(same fields;catalogEntry/pricingacceptnullto clear). - The semantics of per-deployment retries,
reason, and chain lifecycle are in fallbacks.