Unified Gateway

Creating deployments

Register provider deployments behind a public model.

A deployment binds a public name (what clients use), an adapter, an upstream model, and a set of credentials. Several deployments with the same publicModel form a pool with load balancing and fallback.

Everything is done through the admin API with the master key:

Authorization: Bearer $MASTER_KEY
  • Create: POST /admin/deployments
  • Dry-run (validate without saving or encrypting credentials): POST /admin/deployments/resolve
  • List / view / edit / delete: GET|GET/:id|PATCH/:id|DELETE/:id on /admin/deployments
  • Inspect adapters and their operations: GET /admin/operations

Body fields

FieldReq.Description
publicModelyesThe model's public name; it is what the client puts in "model".
provider or adapterKeyyesprovider is a preset (resolves adapter + credentials + transportOverrides). Alternative: an explicit adapterKey.
upstreamModelyesThe real id at the provider (catalog key, or the deployment name on Azure).
credentialsyes{ apiKey, baseUrl?, ... } depending on the preset (see table). Encrypted at rest.
catalogEntrycustom onlyInline catalog entry for a model not present in the catalog. Uses the same shape as catalog.json; for catalog models it must be omitted.
pricingno{ inputCentsPerMTokens?, outputCentsPerMTokens?, cacheReadCentsPerMTokens? }.
transportOverridesnoPer-operation transport (override; the preset already ships defaults).
enabled, weight, tpmLimit, rpmLimitnoDeployment state and limits.

Provider presets

provideradapterrequired credentials
openaiopenaiapiKey
googleaistudiogoogleaistudioapiKey
anthropicanthropicapiKey (version defaulted)
azureopenaiazureopenaiapiKey, baseUrl
azurefoundryazurefoundryapiKey, baseUrl
openrouteropenaicompatibleapiKey (baseUrl defaulted)
openaicompatibleopenaicompatibleapiKey, baseUrl

Catalog vs custom

  • Catalog model (known in code): upstreamModel matches an entry in the adapter's catalog. Its capabilities (limits, reasoning, image/audio formats) come from the provider's JSON → do not send catalogEntry.
  • Custom model (not in the catalog, typical of openaicompatible): you must declare catalogEntry with operations. The presence of an operation implies the model supports it.

Tip: run POST /admin/deployments/resolve with the same body to see source (catalog/custom), the resolved operations, and the transportOverrides before creating.


Examples by type

1) Text / chat — catalog model

curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
  "publicModel": "gpt-5.4",
  "provider": "openai",
  "upstreamModel": "gpt-5.4",
  "credentials": { "apiKey": "sk-..." }
}'

The client then calls POST /v1/chat/completions (or /v1/responses, /v1/messages) with "model": "gpt-5.4".

2) Image — catalog model

curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
  "publicModel": "image-default",
  "provider": "openai",
  "upstreamModel": "gpt-image-1",
  "credentials": { "apiKey": "sk-..." }
}'

Client: POST /v1/images/generations or /v1/images/edits (multipart) with "model": "image-default".

3) Embeddings — catalog model

OpenAI:

curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
  "publicModel": "embed",
  "provider": "openai",
  "upstreamModel": "text-embedding-3-small",
  "credentials": { "apiKey": "sk-..." }
}'

Google AI Studio:

curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
  "publicModel": "embed",
  "provider": "googleaistudio",
  "upstreamModel": "gemini-embedding-001",
  "credentials": { "apiKey": "AIza..." }
}'

Azure OpenAI v1:

curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
  "publicModel": "embed",
  "provider": "azureopenai",
  "upstreamModel": "text-embedding-3-small",
  "credentials": {
    "apiKey": "...",
    "baseUrl": "https://my-resource.openai.azure.com"
  }
}'

Client:

curl -X POST $BASE/v1/embeddings -H "Authorization: Bearer $API_KEY" -H "content-type: application/json" -d '{
  "model": "embed",
  "input": ["red fox", "blue whale"],
  "encoding_format": "float",
  "dimensions": 768
}'

The public contract is OpenAI-compatible. OpenAI, Azure OpenAI v1, and OpenAI-compatible use /embeddings; Google AI Studio uses :embedContent for a single input and :batchEmbedContents for a batch. Google accepts text and encoding_format: "float" in this gateway; pre-tokenized inputs and base64 are rejected by profile.

4) Audio transcription — catalog model

curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
  "publicModel": "transcribe",
  "provider": "openai",
  "upstreamModel": "gpt-4o-transcribe",
  "credentials": { "apiKey": "sk-..." }
}'

Client: POST /v1/audio/transcriptions (multipart, file field) with model=transcribe.

Special case: Azure OpenAI

Azure requires the classic deployment-based API for transcriptions (it does not exist on /openai/v1). It is still created under azureopenai; upstreamModel is the deployment name and baseUrl the resource endpoint. apiVersion is optional (default 2024-06-01; gpt-4o-transcribe may require a more recent one):

curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
  "publicModel": "transcribe",
  "provider": "azureopenai",
  "upstreamModel": "my-transcribe-deployment",
  "credentials": {
    "apiKey": "...",
    "baseUrl": "https://my-resource.openai.azure.com",
    "apiVersion": "2024-06-01"
  }
}'

Since they share publicModel: "transcribe", direct OpenAI and Azure end up in the same pool.


Custom models (OpenAI-compatible)

When the upstreamModel is not in the catalog, declare catalogEntry. It is the same shape as an entry inside models in any catalog.json; only include the operations the model supports.

Custom text

curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
  "publicModel": "llama-local",
  "provider": "openaicompatible",
  "upstreamModel": "llama-3.3-70b",
  "credentials": { "apiKey": "x", "baseUrl": "http://localhost:8000/v1" },
  "catalogEntry": {
    "operations": {
      "text.generate": {
        "capabilities": { "tools": true, "vision": false, "reasoning": false, "structuredOutputs": false },
        "maxInputTokens": 131072,
        "maxOutputTokens": 8192
      }
    }
  }
}'

Reasoning in custom models

A custom model can declare how it controls reasoning with a reasoning block inside catalogEntry.operations.text.generate. For it to be valid, capabilities.reasoning must be true; if it is false, the gateway rejects a reasoning block. The client still uses the canonical knob (reasoning.effort); the gateway snaps it to levels and translates it to the upstream.

FieldDescription
kindopenai_effort (emits reasoning_effort), openai_body (emits a provider-specific top-level field), or chat_template_flag (vLLM-style toggle in chat_template_kwargs).
levelsThe effort levels the model accepts (["low","medium","high"], or ["high"] for a binary toggle).
canDisableWhether reasoning can be turned off (none).
upstreamEffortMapopenai_effort only: translates the canonical effort to the native reasoning_effort label. Keys = canonical efforts; they must be in levels.
bodyField, effortFieldopenai_body only: bodyField defines the on/off top-level field; optional effortField defines where to write the scalar effort.
chatTemplateFlagchat_template_flag only: { param, onValue?, offValue? }. See below.

a) Standard effort with native label (upstreamEffortMap): a provider that does use reasoning_effort but with its own vocabulary (e.g. xhigh → "max"):

"reasoning": { "kind": "openai_effort", "levels": ["low","high","xhigh"], "canDisable": true, "upstreamEffortMap": { "xhigh": "max" } }

b) vLLM-style toggle (chat_template_flag): thinking is enabled with a flag inside chat_template_kwargs (kimi thinking, Qwen enable_thinking, etc.), NOT with reasoning_effort. The canonical effort only decides on/off: active → onValue (default true), inactive → offValue (if omitted, the parameter is not emitted and the template default wins).

curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
  "publicModel": "kimi-k2.6",
  "provider": "openaicompatible",
  "upstreamModel": "moonshotai/kimi-k2.6",
  "credentials": { "apiKey": "x", "baseUrl": "https://integrate.api.nvidia.com/v1" },
  "catalogEntry": {
    "operations": {
      "text.generate": {
        "capabilities": { "tools": true, "vision": true, "reasoning": true, "structuredOutputs": false },
        "maxInputTokens": 262144,
        "maxOutputTokens": 65536,
        "reasoning": {
          "kind": "chat_template_flag",
          "levels": ["high"],
          "canDisable": true,
          "chatTemplateFlag": { "param": "thinking", "offValue": false }
        }
      }
    }
  }
}'

Resulting public API: reasoning.effortnone | high. With high the gateway emits chat_template_kwargs: { "thinking": true }; with none (or when not requested) it emits { "thinking": false } (via offValue; if you omit it, the flag is not emitted and the template default wins). The gateway wins over a chat_template_kwargs.thinking the client sends via extra_body, but preserves the other keys the client puts there.

Vision: with capabilities.vision: true the client passes images in the content array as type: "image_url" (URL or base64) and the adapter forwards them as-is. The canonical contract covers images/audio/files, not video_url; video would require extending the parts model.

Custom image

Image operations require outputFormats, responseFormats, and sizes (or arbitrarySize):

curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
  "publicModel": "img-local",
  "provider": "openaicompatible",
  "upstreamModel": "sdxl",
  "credentials": { "apiKey": "x", "baseUrl": "http://localhost:8000/v1" },
  "catalogEntry": {
    "operations": {
      "image.generate": {
        "maxN": 1,
        "outputFormats": ["png"],
        "responseFormats": ["b64_json"],
        "sizes": { "1024x1024": {} }
      }
    }
  }
}'

Custom transcription

responseFormats is required; the rest is optional:

curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
  "publicModel": "transcribe-local",
  "provider": "openaicompatible",
  "upstreamModel": "custom-transcribe",
  "credentials": { "apiKey": "x", "baseUrl": "http://localhost:8000/v1" },
  "catalogEntry": {
    "operations": {
      "audio.transcribe": {
        "responseFormats": ["json", "text", "verbose_json"],
        "supportsStreaming": false,
        "supportsTimestampGranularities": true,
        "maxFileBytes": 26214400
      }
    }
  }
}'

Custom embeddings

For OpenAI-compatible providers without a built-in catalog, declare embedding.create:

curl -X POST $BASE/admin/deployments -H "Authorization: Bearer $MASTER_KEY" -H "content-type: application/json" -d '{
  "publicModel": "embed-local",
  "provider": "openaicompatible",
  "upstreamModel": "my-embedding-model",
  "credentials": { "apiKey": "x", "baseUrl": "http://localhost:8000/v1" },
  "catalogEntry": {
    "operations": {
      "embedding.create": {
        "dimensions": 1024,
        "supportsDimensions": false,
        "encodingFormats": ["float"],
        "maxInputTokens": 8192,
        "supportsTokenInput": false
      }
    },
    "pricing": { "inputCentsPerMTokens": 2 }
  }
}'

Notes

  • The gateway validates each request against the operation's profile (formats, streaming, sizes, limits). Requesting something outside the profile returns 400 with code: "unsupported_parameter".
  • For several providers behind the same public name, create several deployments with the same publicModel (the router balances and applies fallback).
  • Edit with PATCH /admin/deployments/:id (same fields; catalogEntry/pricing accept null to clear).
  • The semantics of per-deployment retries, reason, and chain lifecycle are in fallbacks.

On this page