Unified Gateway

Operations

Deployment, health probes, migrations, and graceful shutdown.

Operational runbook for running Unified Gateway in production: deploying, wiring health probes, rotating secrets, managing request_logs partitions, and shipping telemetry. If you are still getting the gateway running locally, start with the Quickstart instead — this page assumes a real deployment.

Health and lifecycle

For how to deploy — Docker Compose, Coolify, Portainer, Dokploy, or a Linux VPS — see Deployment. This section covers the runtime behavior you wire into your orchestrator.

The app runs TypeScript directly on Bun — there is no build step. Run bun run --filter @boelabs/unified-gateway db:migrate as a one-off before rolling out new instances; migrations are generated from schema.ts with drizzle-kit and are forward-only — never edit an already-applied migration.

Health probes — two probes with different jobs; wire each to the matching orchestrator probe:

  • GET /health/liveliveness. Always 200 while the process responds; does not touch Postgres/Redis. Use it for the liveness probe and the container healthcheck. Never point a liveness probe at a dependency-aware endpoint: a Postgres/Redis blip would otherwise restart every replica at once (a restart cannot fix the dependency) and turn a blip into an outage.
  • GET /health/readyreadiness. 200 when Postgres, Redis and the extension runtime are healthy; otherwise 503 with Retry-After. Use it for the readiness probe so an unhealthy instance is pulled from the load balancer without being restarted, and rejoins automatically once dependencies recover.
  • GET /health — alias of /health/ready, kept for backward compatibility.
  • During a dependency outage, in-flight inference requests return 503 + Retry-After (not an opaque 500), so well-behaved clients back off and retry.

Shutdown — the process handles SIGTERM/SIGINT, stops accepting traffic, drains in-flight HTTP, then closes Redis/Postgres and flushes OpenTelemetry. Give the container at least SHUTDOWN_TIMEOUT_MS to drain.

Secrets

Two secrets are critical. Store them in your secret manager (not in the image, not in git):

SecretPurposeFormat
MASTER_KEYRoot admin credential; grants full access to /admin/*.Strong random string (≥ 8 chars; use much longer).
CREDENTIALS_ENCRYPTION_KEYEncrypts provider credentials at rest (AES-256-GCM).32 bytes in hex (64 chars).

Generate values:

# MASTER_KEY
openssl rand -base64 48

# CREDENTIALS_ENCRYPTION_KEY (32 bytes hex)
openssl rand -hex 32

Rotating MASTER_KEY

The master key is a single static secret read from the environment. It is not stored in the database, so rotation is a deploy-time operation:

  1. Generate a new strong value.
  2. Update MASTER_KEY in your secret manager.
  3. Roll the deployment so every instance picks up the new value.
  4. Update any admin tooling/CI that authenticated with the old key.

Virtual keys are unaffected — they live in the database and are not derived from the master key.

Rotating CREDENTIALS_ENCRYPTION_KEY

Provider credentials are encrypted with this key, so it cannot be swapped by simply changing the env var — existing ciphertext would no longer decrypt. Rotate by re-encrypting:

  1. Keep the current CREDENTIALS_ENCRYPTION_KEY in place.
  2. For each deployment, re-submit its credentials via PATCH /admin/deployments/:id with the plaintext API key. The gateway re-encrypts on write, so this is also the moment to switch keys if you maintain a key-versioning wrapper.
  3. Because credentials are write-only (the admin API never returns them), rotation requires having the plaintext provider keys on hand. Keep them in your secret manager so you can re-submit.
  4. Once every deployment has been re-encrypted, retire the old key.

If you operate at scale, prefer storing provider credentials in an external secret manager and giving the gateway short-lived access, so encryption-key rotation never requires re-submitting every credential by hand.

Leaked-secret response

If MASTER_KEY leaks: rotate it immediately (above) and audit request_logs / admin access. If a provider API key leaks, revoke it at the provider and PATCH the affected deployments with a new key.

request_logs partitioning

request_logs is partitioned by day. A background job runs every REQUEST_LOG_PARTITION_JOB_INTERVAL_MS and:

  • Drains the DEFAULT partition into daily partitions. Rows can land in DEFAULT before their day's partition exists (e.g. the first requests after a cold start); left there they are never cleaned by retention, and they also block creation of that day's partition. The drain is a no-op when DEFAULT is empty.
  • Creates partitions for today plus REQUEST_LOG_PARTITION_CREATE_DAYS days ahead.
  • Drops partitions older than REQUEST_LOG_PARTITION_RETENTION_DAYS.

This is fully automatic. Across replicas a Postgres advisory lock ensures only one instance runs maintenance per cycle (the drain's DETACH/ATTACH must not run concurrently).

response_states GC

Expired /v1/responses state rows (written when store=true) are deleted automatically by an in-app job every RESPONSE_STATE_GC_INTERVAL_MS; an opportunistic prune on write traffic covers the gaps between ticks.

Backups

Back up Postgres; Redis is disposable. Postgres is the source of truth — deployments, virtual keys, request logs, response states, and router settings. Redis only holds ephemeral runtime state (cooldowns, in-flight counters, rate-limit windows, the response cache), which rebuilds itself, so it needs no backup.

The bundled Postgres uses the standard postgres image and a named volume (pgdata) — exactly what self-hosting platforms back up:

  • Coolify recognizes the postgres service and can schedule logical (pg_dump) backups to S3-compatible storage with retention. Restoring through Coolify's UI is only available for its standalone managed databases, not Compose services — restore a Compose backup manually with pg_restore / psql. For one-click backup and restore, deploy Postgres as a Coolify-managed database and point DATABASE_URL at it over the private network.
  • Dokploy schedules pg_dump backups (to S3) for databases inside a Compose app as well as standalone ones, with restore.
  • Portainer has no database-aware backup: use a pg_dump cron, a volume-backup sidecar (e.g. offen/docker-volume-backup), or back up the pgdata volume at the host.

Manual logical backup/restore (works anywhere):

# Backup
docker compose exec -T postgres pg_dump -U gateway -d unifiedgateway --no-owner | gzip > backup.sql.gz

# Restore into an empty database
gunzip -c backup.sql.gz | docker compose exec -T postgres psql -U gateway -d unifiedgateway

Prefer pg_dump (logical) backups: they are consistent without stopping the gateway. A raw pgdata volume snapshot is only consistent if Postgres is stopped or the platform uses a consistent-snapshot method.

Cost accounting

spend_cents is updated in Postgres on every billed virtual-key request; Redis is the hot-path counter used for budget/limit enforcement. Pricing comes from the model catalog (or a deployment's pricing override); models without pricing are logged with zero cost.

Observability

  • Logs: structured JSON, one object per line, on stdout/stderr (see src/logging/log.ts). Ship them to your log collector.
  • Telemetry: set OTEL_ENABLED=true and OTEL_EXPORTER_OTLP_ENDPOINT to export traces and metrics. OTEL_LOG_PAYLOADS=true attaches full request/response/error payloads to span events (DB logs remain truncated by MAX_STRING_LENGTH_PROMPT_IN_DB).
  • Request correlation: every response carries x-request-id (accepts an inbound x-request-id).

Dependency audit

bun audit --production is expected to be clean for the production dependency tree, and CI enforces it at --audit-level=high.

On this page