Runtime extensions
Hooks, manifest format, failure policy, and the extension SDK.
Runtime extensions let an operator attach trusted behavior to Unified Gateway without forking the repository or rebuilding the official image. They are designed for Docker, Portainer, Coolify, Dokploy, and similar environments: mount extension files as a read-only volume and point the gateway to a manifest.
Extensions are split into two concepts:
- Definitions: trusted ESM modules installed or mounted by the operator.
- Instances: manifest entries that configure a definition with
match,config,priority, and failure policy.
Admins can inspect extension status through GET /admin/extensions, but v1 does not allow editing or
uploading extension code through the Admin API.
Manifest
Set:
UNIFIED_GATEWAY_EXTENSIONS_MANIFEST=/app/extensions/extensions.json
UNIFIED_GATEWAY_EXTENSION_MAX_FAILURES=3
UNIFIED_GATEWAY_EXTENSION_HOOK_TIMEOUT_MS=5000Example manifest:
{
"modules": [{ "path": "./chat-defaults.mjs" }],
"instances": [
{
"id": "chat-defaults-general",
"definition": "chatdefaults",
"enabled": true,
"priority": 50,
"critical": false,
"match": {
"callTypes": ["chat"]
},
"config": {
"temperature": 0.2,
"maxTokens": 1024
}
}
]
}Paths are resolved relative to the manifest file.
Docker Compose pattern:
services:
unifiedgateway:
image: ghcr.io/boelabs/unified-gateway:1
volumes:
- ./extensions:/app/extensions:ro
environment:
UNIFIED_GATEWAY_EXTENSIONS_MANIFEST: /app/extensions/extensions.json
UNIFIED_GATEWAY_EXTENSION_MAX_FAILURES: 3Module API
Extension modules export a definition. Mounted modules under /app/extensions can import the SDK
through the package import map:
import { defineExtension } from "#extensions/sdk.ts";
export default defineExtension({
key: "chatdefaults",
version: "1.0.0",
label: "Chat defaults",
description: "Applies default generation parameters.",
hooks: {
onCanonicalRequest(ctx, request) {
if (request.callType !== "chat") return request;
return {
...request,
temperature: request.temperature ?? ctx.config.temperature,
maxTokens: request.maxTokens ?? ctx.config.maxTokens
};
}
}
});The repository includes ready-to-mount examples in apps/gateway/examples/extensions (see its
README for the full write-up). They cover every hook:
prompt-firewall.mjs: neutralizes or blocks prompt-injection attempts in inbound text (onCanonicalRequest).pii-vault.mjs: tokenizes PII before it reaches the upstream model and restores it in the reply, including across streaming chunk boundaries (onCanonicalRequest+onCanonicalResponse+onStreamEvent+onError).provenance-watermark.mjs: embeds an invisible zero-width provenance marker into assistant text (onCanonicalResponse+onStreamEvent).tiered-image-watermark.mjs: stamps a visible preview watermark on images for non-privileged keys (onImageOutput).
Because hooks run after each public wire format is translated into Unified Gateway's canonical request,
one callTypes: ["chat"] instance can affect all text wires: /v1/chat/completions,
/v1/responses, and /v1/messages. Extension authors do not need separate branches for
OpenAI-style messages, OpenAI Responses input/instructions, or Anthropic system.
Available hooks in v1:
onCanonicalRequest(ctx, request)onCanonicalResponse(ctx, response)onStreamEvent(ctx, event)onImageOutput(ctx, output)onError(ctx, error)
The context includes requestId, callType, endpoint, publicModel, sanitized auth data,
extensionKey, instanceId, config, match, signal, and a structured logger. Deployment
credentials are never exposed.
Built-in match fields:
models: public model names.callTypes: internal call types such aschat,images.generations,images.edits,embeddings, andaudio.transcriptions.endpoints: public endpoint paths.
Failures
Invalid manifest JSON, duplicate definitions, invalid module exports, or a critical instance with invalid config fail startup. Non-critical invalid instances are disabled and reported in status.
At runtime, hook failures return a sanitized gateway error for that request. After
UNIFIED_GATEWAY_EXTENSION_MAX_FAILURES consecutive failures, the instance is disabled for the current
process. If a disabled instance is critical, affected requests fail with extension_disabled and
/health returns unhealthy. Non-critical disabled instances are skipped and reported as degraded.
Each hook runs under a wall-clock budget (UNIFIED_GATEWAY_EXTENSION_HOOK_TIMEOUT_MS, default 5000ms;
set 0 to disable). A hook that exceeds it — or one that ignores its ctx.signal while the client
cancels — is aborted and counts as a failure, so a misbehaving hook can never block request
processing indefinitely. Well-behaved hooks should honor ctx.signal, which fires on both timeout
and upstream cancellation.
onError is a fire-and-forget observability hook: a failure inside it is logged and surfaced in
status but never trips the circuit breaker, so a logging glitch cannot disable a critical instance.
An instance disabled by the circuit breaker can be re-activated for the current process without a
restart via POST /admin/extensions/{id}/reset. Instances disabled by configuration, load-time
validation, or setup are not eligible (their underlying problem persists) and the endpoint returns
bad_request.
Image Outputs
Unified Gateway re-encodes returned PNG, JPEG, and WebP files to strip upstream metadata by default. It
does not stamp product or owner metadata in core. Operators that want image post-processing can mount
a private extension that uses onImageOutput.