A lightweight, OpenAI-compatible API gateway written in Go that routes requests sequentially through configured providers until a successful response is received.
Build requirements: Go 1.25+ (see app/go.mod).
Born from Frustration: Created when Cloudflare AI Gateway unexpectedly started disconnecting users without explanation. This self-hosted alternative gives you full control with no vendor lock-in.
Daily Use Case: Connects to multiple AI providers with free tiers, automatically cycling between them when rate limits are hit - ensuring continuous service.
- Lightweight: Small stripped binary (~13MB typical local build) with minimal memory footprint
- Fast: Compiled Go with efficient runtime, no JVM overhead
- Reliable: Sequential provider fallback, automatic retry logic
- Simple: Single binary deployment, YAML configuration
- Secure: API key redaction, non-root execution, restrictive permissions
- Open-AI Compatible: Drop-in replacement for OpenAI API in tools like n8n. Just change the API base and make sure the requested model name matches one of your configured routes.
- Configure the gateway:
cp config.yaml.example config.yaml
# Edit config.yaml with your API keys
# See Configuration below- Deploy locally:
./ops.sh build # Build binary
./ops.sh install-service # Install as systemd service
sudo systemctl start ai-gateway # Start service- Or deploy remotely:
cp .env.example .env # Configure SSH credentials
# Edit .env with your server details or put them into the command string
SSH_HOST=your-server.com ./ops.sh deployCommands below use ops.sh at the repo root (formerly install.sh).
- Local:
./ops.sh build→./ops.sh install-servicefor development/production on same machine - Remote (systemd):
./ops.sh deploy— SSH upload, remote installation, and systemd service setup - Remote (Docker, CLI):
./ops.sh deploy-docker— same as./scripts/sync-config-to-vds.sh(uploaddocker-compose.yml,config.yaml, filtered.env;docker compose pull && up -dwith GHCR image and bind-mounted config) - Binary-only:
./ops.sh install— copy binary only, no systemd service
For systemd deployments, use a reverse proxy like nginx or traefik to set up TLS termination and secure the traffic to your gateway.
Deploy as a container behind Traefik (or any reverse proxy) for HTTPS termination:
- Prerequisites: Docker on the remote server; Traefik with
traefik-publicnetwork - Configure: Ensure
config.yamland.envexist with your API keys (GATEWAY_API_KEY, provider keys) - Deploy:
cp .env.example .env # if needed # Edit .env with SSH_HOST, SSH_USER, DOMAIN, and runtime vars (GATEWAY_API_KEY, etc.) ./ops.sh deploy-docker
- Domain: Set
DOMAINin.env(e.g.DOMAIN=ai-gateway.example.com). docker-compose uses it for the Traefik Host rule. - n8n integration: Set Base URL to
https://YOUR_DOMAIN/v1(your Traefik host), Model to a route name fromconfig.yaml, API Key to yourGATEWAY_API_KEYvalue.
Pushes to main run go test ./... -count=1 in app/ first; if tests pass, the workflow builds the image on GitHub, pushes images to GHCR (:main and :sha-<short>), then SSHs into the VDS and runs docker compose pull && docker compose up -d in /root/services/ai-gateway.
Repository secrets (CI / deploy job)
| Secret | Purpose |
|---|---|
SSH_HOST |
VDS hostname or IP |
SSH_USER |
SSH user |
SSH_PRIVATE_KEY |
Private key (PEM) for that user |
SSH_PORT |
Optional; defaults to 22 if omitted |
GHCR_PULL_USER |
Optional; GitHub username for docker login on the VDS when the package is private |
GHCR_PULL_TOKEN |
Optional; PAT with read:packages for that login |
Runtime on the VDS (not stored in GitHub): a gitignored config.yaml and .env beside docker-compose.yml. The compose file pulls ghcr.io/leshchenko1979/ai-gateway tagged by IMAGE_TAG (default main). Set GHCR_IMAGE and IMAGE_TAG in the VDS .env if you use a fork or pin a digest tag.
Syncing config from your machine without committing secrets: copy config.yaml and .env from the examples, fill them in, then run ./scripts/sync-config-to-vds.sh. That uploads docker-compose.yml, config.yaml, and a filtered .env (SSH and GHCR_PULL_* lines are stripped so they are not left on the server), optionally runs docker login on the VDS when GHCR_PULL_USER and GHCR_PULL_TOKEN are set locally, then docker compose pull && up -d. In Cursor/VS Code, use Tasks → Run Build Task and choose Sync config and env to VDS (non-default build task).
Local Docker build instead of pulling GHCR: add a gitignored docker-compose.override.yml that sets build: { context: ., dockerfile: Dockerfile } on ai-gateway and drop image: for local use.
CLI alternative to the VS Code sync task: ./ops.sh deploy-docker runs the same steps as ./scripts/sync-config-to-vds.sh, then waits for the container. Use either the script or ops.sh; both match the current docker-compose.yml (pulled image + ./config.yaml mount).
The gateway uses YAML configuration with environment variable substitution:
api_key: ${GATEWAY_API_KEY} # Gateway authentication key
port: 8080 # Optional, defaults to 8080
default_timeout: 300s # Example; omit for built-in default 30s
providers:
- name: cerebras
api_key: ${CEREBRAS_API_KEY}
base_url: https://api.cerebras.ai/v1
- name: openrouter
api_key: ${OPENROUTER_API_KEY}
base_url: https://openrouter.ai/api/v1
routes:
- name: dynamic/n8n # Exact model name match required
steps:
- provider: cerebras
model: gpt-oss-120b
conflict_resolution: tools # Remove response_format if tools present
- provider: openrouter
model: nvidia/nemotron-3-nano-30b-a3b:freeYou can put your API keys into config.yaml directly, but for security purposes it's better to store them in env vars and use them in config.yaml.
Configuration Locations:
./config.yaml(current directory)/etc/ai-gateway/config.yaml(system location)
Environment Variables:
GATEWAY_API_KEY: Required for authentication- Provider API keys:
${PROVIDER_NAME}_API_KEY - Missing
${VAR}values cause startup errors with a clear list of missing vars
These endpoints do not require authentication:
GET /healthGET /v1/diagnostics/upstream-models— see Diagnostics
For all other routes (for example GET /v1/models and POST /v1/chat/completions), use X-Api-Key or Authorization: Bearer <token> with your configured gateway API key.
GET /healthReturns {"status": "healthy"} - no authentication required.
GET /v1/diagnostics/upstream-modelsNo authentication. Calls each configured provider’s OpenAI-style GET {base_url}/models in parallel and returns JSON with an overall ok flag and per-provider results. Responds with 503 if any provider check fails, 200 if all succeed.
Security: This route is unauthenticated and triggers outbound requests to provider APIs. Do not expose it on the public internet without network restrictions (for example reverse-proxy allowlists, VPN, or private ingress).
GET /v1/models
Headers: X-Api-Key: <gateway-api-key> OR Authorization: Bearer <token>Returns available route names from the configuration, which serve as the model names for requests.
POST /v1/chat/completions
Headers: X-Api-Key: <gateway-api-key> OR Authorization: Bearer <token>Routes requests to providers. Set model to the desired route name.
Response with routing summary:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "dynamic/n8n",
"choices": [...],
"routing_summary": {
"route_name": "dynamic/n8n",
"steps": [
{"step_index": 0, "provider": "cerebras", "model": "gpt-oss-120b", "success": true, "duration_ms": 1234},
{"step_index": 1, "provider": "openrouter", "model": "nvidia/nemotron-3-nano-30b-a3b:free", "success": false, "duration_ms": 500, "error": "rate limit exceeded"}
]
}
}The routing_summary field shows which route steps were attempted, which succeeded, and the timing in milliseconds for each step. Failed steps include the error message. Included in both successful responses and error responses when all steps fail.
sudo systemctl start ai-gateway # Start service
sudo systemctl stop ai-gateway # Stop service
sudo systemctl enable ai-gateway # Enable auto-start
sudo systemctl status ai-gateway # Check status
sudo journalctl -u ai-gateway -f # View logs- Security: API key redaction, non-root execution, restrictive file permissions (600), TLS recommended
- Logging: Structured JSON logs with request/response summaries, automatic key redaction
- Error Handling: Sequential provider fallback on any error, detailed error messages with provider info
The gateway can send OpenTelemetry traces and logger events directly to any OTLP/HTTP-compliant collector. Configure the following environment variables to point at your observability backend (Grafana Cloud, Alloy, Tempo, or other OTLP destination):
OTLP_ENDPOINT: Full URL to the OTLP HTTP endpoint. Supports bothhost:portand full URLs likehttps://otlp-gateway.example.com/otlp.OTLP_API_KEY: API key or token.- For Grafana Cloud: You can use a standard
glc_Access Policy Token. The gateway automatically extracts the Instance ID from the token and handles the required Basic authentication (instanceID:apiKey). - For other collectors: It uses the provided key for Basic authentication (
apiKey:).
- For Grafana Cloud: You can use a standard
OTEL_SERVICE_NAME(orOTLP_SERVICE_NAME): Optional. The service name (ai-gateway) used to group spans/logs.OTEL_RESOURCE_ATTRIBUTES(orOTLP_RESOURCE_ATTRIBUTES): Optional. Comma-separatedkey=valuepairs added to each resource (e.g.,deployment.environment=production).OTLP_HEADERS(optional): Optional. Extra headers inKey=ValueCSV format.
The gateway uses the OTLP HTTP exporter for maximum compatibility (bypassing gRPC/ALPN issues). It automatically handles the /v1/traces signal path, ensuring that if you provide a base URL (like Grafana's /otlp), it still reaches the correct endpoint.
Bug reports and pull requests are welcome. See CONTRIBUTING.md for setup, tests, and PR expectations. For security-sensitive issues, use the process in SECURITY.md instead of a public issue.
See SECURITY.md.