Operations
Deployment, health probes, migrations, and graceful shutdown.
Operational runbook for running Unified Gateway in production: deploying, wiring health probes, rotating
secrets, managing request_logs partitions, and shipping telemetry. If you are still getting the
gateway running locally, start with the Quickstart instead — this page assumes a
real deployment.
Health and lifecycle
For how to deploy — Docker Compose, Coolify, Portainer, Dokploy, or a Linux VPS — see Deployment. This section covers the runtime behavior you wire into your orchestrator.
The app runs TypeScript directly on Bun — there is no build step. Run
bun run --filter @boelabs/unified-gateway db:migrate as a one-off before rolling out new
instances; migrations are generated from schema.ts with drizzle-kit and are forward-only — never
edit an already-applied migration.
Health probes — two probes with different jobs; wire each to the matching orchestrator probe:
GET /health/live— liveness. Always200while the process responds; does not touch Postgres/Redis. Use it for the liveness probe and the container healthcheck. Never point a liveness probe at a dependency-aware endpoint: a Postgres/Redis blip would otherwise restart every replica at once (a restart cannot fix the dependency) and turn a blip into an outage.GET /health/ready— readiness.200when Postgres, Redis and the extension runtime are healthy; otherwise503withRetry-After. Use it for the readiness probe so an unhealthy instance is pulled from the load balancer without being restarted, and rejoins automatically once dependencies recover.GET /health— alias of/health/ready, kept for backward compatibility.- During a dependency outage, in-flight inference requests return
503+Retry-After(not an opaque500), so well-behaved clients back off and retry.
Shutdown — the process handles SIGTERM/SIGINT, stops accepting traffic, drains in-flight HTTP,
then closes Redis/Postgres and flushes OpenTelemetry. Give the
container at least SHUTDOWN_TIMEOUT_MS to drain.
Secrets
Two secrets are critical. Store them in your secret manager (not in the image, not in git):
| Secret | Purpose | Format |
|---|---|---|
MASTER_KEY | Root admin credential; grants full access to /admin/*. | Strong random string (≥ 8 chars; use much longer). |
CREDENTIALS_ENCRYPTION_KEY | Encrypts provider credentials at rest (AES-256-GCM). | 32 bytes in hex (64 chars). |
Generate values:
# MASTER_KEY
openssl rand -base64 48
# CREDENTIALS_ENCRYPTION_KEY (32 bytes hex)
openssl rand -hex 32Rotating MASTER_KEY
The master key is a single static secret read from the environment. It is not stored in the database, so rotation is a deploy-time operation:
- Generate a new strong value.
- Update
MASTER_KEYin your secret manager. - Roll the deployment so every instance picks up the new value.
- Update any admin tooling/CI that authenticated with the old key.
Virtual keys are unaffected — they live in the database and are not derived from the master key.
Rotating CREDENTIALS_ENCRYPTION_KEY
Provider credentials are encrypted with this key, so it cannot be swapped by simply changing the env var — existing ciphertext would no longer decrypt. Rotate by re-encrypting:
- Keep the current
CREDENTIALS_ENCRYPTION_KEYin place. - For each deployment, re-submit its credentials via
PATCH /admin/deployments/:idwith the plaintext API key. The gateway re-encrypts on write, so this is also the moment to switch keys if you maintain a key-versioning wrapper. - Because credentials are write-only (the admin API never returns them), rotation requires having the plaintext provider keys on hand. Keep them in your secret manager so you can re-submit.
- Once every deployment has been re-encrypted, retire the old key.
If you operate at scale, prefer storing provider credentials in an external secret manager and giving the gateway short-lived access, so encryption-key rotation never requires re-submitting every credential by hand.
Leaked-secret response
If MASTER_KEY leaks: rotate it immediately (above) and audit request_logs / admin access. If a
provider API key leaks, revoke it at the provider and PATCH the affected deployments with a new
key.
request_logs partitioning
request_logs is partitioned by day. A background job runs every
REQUEST_LOG_PARTITION_JOB_INTERVAL_MS and:
- Drains the
DEFAULTpartition into daily partitions. Rows can land inDEFAULTbefore their day's partition exists (e.g. the first requests after a cold start); left there they are never cleaned by retention, and they also block creation of that day's partition. The drain is a no-op whenDEFAULTis empty. - Creates partitions for today plus
REQUEST_LOG_PARTITION_CREATE_DAYSdays ahead. - Drops partitions older than
REQUEST_LOG_PARTITION_RETENTION_DAYS.
This is fully automatic. Across replicas a Postgres advisory lock ensures only one instance runs
maintenance per cycle (the drain's DETACH/ATTACH must not run concurrently).
response_states GC
Expired /v1/responses state rows (written when store=true) are deleted automatically by an in-app
job every RESPONSE_STATE_GC_INTERVAL_MS; an opportunistic prune on write traffic covers the gaps
between ticks.
Backups
Back up Postgres; Redis is disposable. Postgres is the source of truth — deployments, virtual keys, request logs, response states, and router settings. Redis only holds ephemeral runtime state (cooldowns, in-flight counters, rate-limit windows, the response cache), which rebuilds itself, so it needs no backup.
The bundled Postgres uses the standard postgres image and a named volume (pgdata) — exactly what
self-hosting platforms back up:
- Coolify recognizes the
postgresservice and can schedule logical (pg_dump) backups to S3-compatible storage with retention. Restoring through Coolify's UI is only available for its standalone managed databases, not Compose services — restore a Compose backup manually withpg_restore/psql. For one-click backup and restore, deploy Postgres as a Coolify-managed database and pointDATABASE_URLat it over the private network. - Dokploy schedules
pg_dumpbackups (to S3) for databases inside a Compose app as well as standalone ones, with restore. - Portainer has no database-aware backup: use a
pg_dumpcron, a volume-backup sidecar (e.g.offen/docker-volume-backup), or back up thepgdatavolume at the host.
Manual logical backup/restore (works anywhere):
# Backup
docker compose exec -T postgres pg_dump -U gateway -d unifiedgateway --no-owner | gzip > backup.sql.gz
# Restore into an empty database
gunzip -c backup.sql.gz | docker compose exec -T postgres psql -U gateway -d unifiedgatewayPrefer
pg_dump(logical) backups: they are consistent without stopping the gateway. A rawpgdatavolume snapshot is only consistent if Postgres is stopped or the platform uses a consistent-snapshot method.
Cost accounting
spend_cents is updated in Postgres on every billed virtual-key request; Redis is the hot-path
counter used for budget/limit enforcement. Pricing comes from the model catalog (or a deployment's
pricing override); models without pricing are logged with zero cost.
Observability
- Logs: structured JSON, one object per line, on stdout/stderr (see
src/logging/log.ts). Ship them to your log collector. - Telemetry: set
OTEL_ENABLED=trueandOTEL_EXPORTER_OTLP_ENDPOINTto export traces and metrics.OTEL_LOG_PAYLOADS=trueattaches full request/response/error payloads to span events (DB logs remain truncated byMAX_STRING_LENGTH_PROMPT_IN_DB). - Request correlation: every response carries
x-request-id(accepts an inboundx-request-id).
Dependency audit
bun audit --production is expected to be clean for the production dependency tree, and CI enforces it
at --audit-level=high.