Unified Gateway

Runtime extensions

Hooks, manifest format, failure policy, and the extension SDK.

Runtime extensions let an operator attach trusted behavior to Unified Gateway without forking the repository or rebuilding the official image. They are designed for Docker, Portainer, Coolify, Dokploy, and similar environments: mount extension files as a read-only volume and point the gateway to a manifest.

Extensions are split into two concepts:

  • Definitions: trusted ESM modules installed or mounted by the operator.
  • Instances: manifest entries that configure a definition with match, config, priority, and failure policy.

Admins can inspect extension status through GET /admin/extensions, but v1 does not allow editing or uploading extension code through the Admin API.

Manifest

Set:

UNIFIED_GATEWAY_EXTENSIONS_MANIFEST=/app/extensions/extensions.json
UNIFIED_GATEWAY_EXTENSION_MAX_FAILURES=3
UNIFIED_GATEWAY_EXTENSION_HOOK_TIMEOUT_MS=5000

Example manifest:

{
  "modules": [{ "path": "./chat-defaults.mjs" }],
  "instances": [
    {
      "id": "chat-defaults-general",
      "definition": "chatdefaults",
      "enabled": true,
      "priority": 50,
      "critical": false,
      "match": {
        "callTypes": ["chat"]
      },
      "config": {
        "temperature": 0.2,
        "maxTokens": 1024
      }
    }
  ]
}

Paths are resolved relative to the manifest file.

Docker Compose pattern:

services:
  unifiedgateway:
    image: ghcr.io/boelabs/unified-gateway:1
    volumes:
      - ./extensions:/app/extensions:ro
    environment:
      UNIFIED_GATEWAY_EXTENSIONS_MANIFEST: /app/extensions/extensions.json
      UNIFIED_GATEWAY_EXTENSION_MAX_FAILURES: 3

Module API

Extension modules export a definition. Mounted modules under /app/extensions can import the SDK through the package import map:

import { defineExtension } from "#extensions/sdk.ts";

export default defineExtension({
  key: "chatdefaults",
  version: "1.0.0",
  label: "Chat defaults",
  description: "Applies default generation parameters.",
  hooks: {
    onCanonicalRequest(ctx, request) {
      if (request.callType !== "chat") return request;
      return {
        ...request,
        temperature: request.temperature ?? ctx.config.temperature,
        maxTokens: request.maxTokens ?? ctx.config.maxTokens
      };
    }
  }
});

The repository includes ready-to-mount examples in apps/gateway/examples/extensions (see its README for the full write-up). They cover every hook:

  • prompt-firewall.mjs: neutralizes or blocks prompt-injection attempts in inbound text (onCanonicalRequest).
  • pii-vault.mjs: tokenizes PII before it reaches the upstream model and restores it in the reply, including across streaming chunk boundaries (onCanonicalRequest + onCanonicalResponse + onStreamEvent + onError).
  • provenance-watermark.mjs: embeds an invisible zero-width provenance marker into assistant text (onCanonicalResponse + onStreamEvent).
  • tiered-image-watermark.mjs: stamps a visible preview watermark on images for non-privileged keys (onImageOutput).

Because hooks run after each public wire format is translated into Unified Gateway's canonical request, one callTypes: ["chat"] instance can affect all text wires: /v1/chat/completions, /v1/responses, and /v1/messages. Extension authors do not need separate branches for OpenAI-style messages, OpenAI Responses input/instructions, or Anthropic system.

Available hooks in v1:

  • onCanonicalRequest(ctx, request)
  • onCanonicalResponse(ctx, response)
  • onStreamEvent(ctx, event)
  • onImageOutput(ctx, output)
  • onError(ctx, error)

The context includes requestId, callType, endpoint, publicModel, sanitized auth data, extensionKey, instanceId, config, match, signal, and a structured logger. Deployment credentials are never exposed.

Built-in match fields:

  • models: public model names.
  • callTypes: internal call types such as chat, images.generations, images.edits, embeddings, and audio.transcriptions.
  • endpoints: public endpoint paths.

Failures

Invalid manifest JSON, duplicate definitions, invalid module exports, or a critical instance with invalid config fail startup. Non-critical invalid instances are disabled and reported in status.

At runtime, hook failures return a sanitized gateway error for that request. After UNIFIED_GATEWAY_EXTENSION_MAX_FAILURES consecutive failures, the instance is disabled for the current process. If a disabled instance is critical, affected requests fail with extension_disabled and /health returns unhealthy. Non-critical disabled instances are skipped and reported as degraded.

Each hook runs under a wall-clock budget (UNIFIED_GATEWAY_EXTENSION_HOOK_TIMEOUT_MS, default 5000ms; set 0 to disable). A hook that exceeds it — or one that ignores its ctx.signal while the client cancels — is aborted and counts as a failure, so a misbehaving hook can never block request processing indefinitely. Well-behaved hooks should honor ctx.signal, which fires on both timeout and upstream cancellation.

onError is a fire-and-forget observability hook: a failure inside it is logged and surfaced in status but never trips the circuit breaker, so a logging glitch cannot disable a critical instance.

An instance disabled by the circuit breaker can be re-activated for the current process without a restart via POST /admin/extensions/{id}/reset. Instances disabled by configuration, load-time validation, or setup are not eligible (their underlying problem persists) and the endpoint returns bad_request.

Image Outputs

Unified Gateway re-encodes returned PNG, JPEG, and WebP files to strip upstream metadata by default. It does not stamp product or owner metadata in core. Operators that want image post-processing can mount a private extension that uses onImageOutput.

On this page