Concepts
The mental model: how a request flows through the gateway.
Unified Gateway sits between your clients and many providers. Clients speak a stable, public contract; the gateway translates each request into one provider-agnostic internal shape, routes it to a real provider, and translates the answer back. This page is the mental model; the Glossary is the precise, canonical definition of each term.
The one diagram
Client -> Public Endpoint -> Canonical request
-> Public Model -> deployment pool -> Deployment
-> Transport -> provider -> Canonical response
-> Public Endpoint contract -> clientEverything below is just the pieces of that line.
The pieces
Public Endpoint — the route a client calls, with that API's exact contract:
/v1/chat/completions, /v1/responses, /v1/messages, /v1/embeddings, /v1/images/*,
/v1/audio/transcriptions. The endpoint decides the client-facing shape, not the provider protocol.
Public Model — the name a client puts in model (general, image-default). It is yours to
choose, not a provider's model id. There is no separate "model group" entity: every deployment that
shares a publicModel automatically forms its deployment pool.
Deployment — one executable route to a provider: a publicModel + adapter + upstreamModel +
encrypted credentials + weight and limits. Several deployments under the same public name give you
load balancing and fallback for free.
Adapter — the code that knows a provider. It translates the canonical request into one or more upstream protocols and maps provider errors back into the gateway's error classes.
Transport — the actual protocol the adapter speaks to the provider: chat_completions,
responses, generate_content, messages, images, audio_transcriptions, embeddings, or
embed_content. It is inferred per operation from the adapter; transportOverrides is the rare
manual override.
Canonical — the single provider-agnostic representation in the core. Every public wire format is
converted into canonical on the way in, and rendered from canonical on the way out. This is why an
extension hook or a fallback target written once works across /v1/chat/completions, /v1/responses,
and /v1/messages alike — they are the same canonical request underneath.
Why public names, not provider ids
Decoupling the client-facing name from the provider model is what makes the rest possible:
- Swap providers without touching clients — repoint
generalfrom OpenAI to Anthropic by editing deployments; the client keeps sendingmodel: "general". - Pool and balance — add a second deployment under
generaland the router spreads load and retries across both. - Fall back — when a pool is exhausted, a fallback chain keyed by
(publicModel, reason)advances to another public model, same canonical request. - One control plane — auth, virtual-key scopes, budgets, rate limits, logging, and cost accounting all key off the public name, regardless of which provider served it.
How a request is served
- A client calls a Public Endpoint with a virtual or master key and a
model. - Auth resolves the key, checks the model is in scope, and applies rate limits.
- The endpoint parses the public contract into a canonical request.
- The router resolves the Public Model to its deployment pool and picks candidates by weight, cooldown, and the request's operation profile.
- Each candidate runs over its Transport; retryable failures move to the next candidate, and an exhausted pool may enter a fallback chain.
- The canonical response is rendered back into the calling endpoint's contract and returned, with
x-request-idand (for limited keys)x-ratelimit-*headers.
Optional layers wrap this: an opt-in response cache short-circuits step 4 on a hit, and runtime extensions can hook the canonical request, response, stream, image output, or errors.
Where to go next
- Quickstart — do all of the above once, hands-on.
- Creating deployments — define public models over real providers.
- Glossary — the exact, contract-level definition of every term here.