AI Gateway

A lightweight, OpenAI-compatible API gateway written in Go that routes requests sequentially through configured providers until a successful response is received.

Build requirements: Go 1.25+ (see app/go.mod).

Why Choose AI Gateway?

Born from Frustration: Created when Cloudflare AI Gateway unexpectedly started disconnecting users without explanation. This self-hosted alternative gives you full control with no vendor lock-in.

Daily Use Case: Connects to multiple AI providers with free tiers, automatically cycling between them when rate limits are hit - ensuring continuous service.

Key Benefits

Lightweight: Small stripped binary (~13MB typical local build) with minimal memory footprint
Fast: Compiled Go with efficient runtime, no JVM overhead
Reliable: Sequential provider fallback, automatic retry logic
Simple: Single binary deployment, YAML configuration
Secure: API key redaction, non-root execution, restrictive permissions
Open-AI Compatible: Drop-in replacement for OpenAI API in tools like n8n. Just change the API base and make sure the requested model name matches one of your configured routes.

Installation

Quick Setup

Configure the gateway:

cp config.yaml.example config.yaml
# Edit config.yaml with your API keys
# See Configuration below

Deploy locally:

./ops.sh build                    # Build binary
./ops.sh install-service         # Install as systemd service
sudo systemctl start ai-gateway      # Start service

Or deploy remotely:

cp .env.example .env                  # Configure SSH credentials
# Edit .env with your server details or put them into the command string
SSH_HOST=your-server.com ./ops.sh deploy

Deployment Options

Commands below use ops.sh at the repo root (formerly install.sh).

Local: ./ops.sh build → ./ops.sh install-service for development/production on same machine
Remote (systemd): ./ops.sh deploy — SSH upload, remote installation, and systemd service setup
Remote (Docker, CLI): ./ops.sh deploy-docker — same as ./scripts/sync-config-to-vds.sh (upload docker-compose.yml, config.yaml, filtered .env; docker compose pull && up -d with GHCR image and bind-mounted config)
Binary-only: ./ops.sh install — copy binary only, no systemd service

For systemd deployments, use a reverse proxy like nginx or traefik to set up TLS termination and secure the traffic to your gateway.

Docker Installation

Deploy as a container behind Traefik (or any reverse proxy) for HTTPS termination:

Prerequisites: Docker on the remote server; Traefik with traefik-public network
Configure: Ensure config.yaml and .env exist with your API keys (GATEWAY_API_KEY, provider keys)

Deploy:

cp .env.example .env   # if needed
# Edit .env with SSH_HOST, SSH_USER, DOMAIN, and runtime vars (GATEWAY_API_KEY, etc.)
./ops.sh deploy-docker

Domain: Set DOMAIN in .env (e.g. DOMAIN=ai-gateway.example.com). docker-compose uses it for the Traefik Host rule.
n8n integration: Set Base URL to https://YOUR_DOMAIN/v1 (your Traefik host), Model to a route name from config.yaml, API Key to your GATEWAY_API_KEY value.

CI/CD (GitHub Actions + GHCR)

Pushes to main run go test ./... -count=1 in app/ first; if tests pass, the workflow builds the image on GitHub, pushes images to GHCR (:main and :sha-<short>), then SSHs into the VDS and runs docker compose pull && docker compose up -d in /root/services/ai-gateway.

Repository secrets (CI / deploy job)

Secret	Purpose
`SSH_HOST`	VDS hostname or IP
`SSH_USER`	SSH user
`SSH_PRIVATE_KEY`	Private key (PEM) for that user
`SSH_PORT`	Optional; defaults to `22` if omitted
`GHCR_PULL_USER`	Optional; GitHub username for `docker login` on the VDS when the package is private
`GHCR_PULL_TOKEN`	Optional; PAT with `read:packages` for that login

Runtime on the VDS (not stored in GitHub): a gitignored config.yaml and .env beside docker-compose.yml. The compose file pulls ghcr.io/leshchenko1979/ai-gateway tagged by IMAGE_TAG (default main). Set GHCR_IMAGE and IMAGE_TAG in the VDS .env if you use a fork or pin a digest tag.

Syncing config from your machine without committing secrets: copy config.yaml and .env from the examples, fill them in, then run ./scripts/sync-config-to-vds.sh. That uploads docker-compose.yml, config.yaml, and a filtered .env (SSH and GHCR_PULL_* lines are stripped so they are not left on the server), optionally runs docker login on the VDS when GHCR_PULL_USER and GHCR_PULL_TOKEN are set locally, then docker compose pull && up -d. In Cursor/VS Code, use Tasks → Run Build Task and choose Sync config and env to VDS (non-default build task).

Local Docker build instead of pulling GHCR: add a gitignored docker-compose.override.yml that sets build: { context: ., dockerfile: Dockerfile } on ai-gateway and drop image: for local use.

CLI alternative to the VS Code sync task: ./ops.sh deploy-docker runs the same steps as ./scripts/sync-config-to-vds.sh, then waits for the container. Use either the script or ops.sh; both match the current docker-compose.yml (pulled image + ./config.yaml mount).

Configuration

The gateway uses YAML configuration with environment variable substitution:

api_key: ${GATEWAY_API_KEY}  # Gateway authentication key
port: 8080                   # Optional, defaults to 8080
default_timeout: 300s        # Example; omit for built-in default 30s

providers:
  - name: cerebras
    api_key: ${CEREBRAS_API_KEY}
    base_url: https://api.cerebras.ai/v1
  - name: openrouter
    api_key: ${OPENROUTER_API_KEY}
    base_url: https://openrouter.ai/api/v1

routes:
  - name: dynamic/n8n  # Exact model name match required
    steps:
      - provider: cerebras
        model: gpt-oss-120b
        conflict_resolution: tools  # Remove response_format if tools present
      - provider: openrouter
        model: nvidia/nemotron-3-nano-30b-a3b:free

You can put your API keys into config.yaml directly, but for security purposes it's better to store them in env vars and use them in config.yaml.

Configuration Locations:

./config.yaml (current directory)
/etc/ai-gateway/config.yaml (system location)

Environment Variables:

GATEWAY_API_KEY: Required for authentication
Provider API keys: ${PROVIDER_NAME}_API_KEY
Missing ${VAR} values cause startup errors with a clear list of missing vars

API Endpoints

Authentication

These endpoints do not require authentication:

GET /health
GET /v1/diagnostics/upstream-models — see Diagnostics

For all other routes (for example GET /v1/models and POST /v1/chat/completions), use X-Api-Key or Authorization: Bearer <token> with your configured gateway API key.

Health Check

GET /health

Returns {"status": "healthy"} - no authentication required.

Diagnostics

GET /v1/diagnostics/upstream-models

No authentication. Calls each configured provider’s OpenAI-style GET {base_url}/models in parallel and returns JSON with an overall ok flag and per-provider results. Responds with 503 if any provider check fails, 200 if all succeed.

Security: This route is unauthenticated and triggers outbound requests to provider APIs. Do not expose it on the public internet without network restrictions (for example reverse-proxy allowlists, VPN, or private ingress).

List Models

GET /v1/models
Headers: X-Api-Key: <gateway-api-key> OR Authorization: Bearer <token>

Returns available route names from the configuration, which serve as the model names for requests.

Chat Completions

POST /v1/chat/completions
Headers: X-Api-Key: <gateway-api-key> OR Authorization: Bearer <token>

Routes requests to providers. Set model to the desired route name.

Response with routing summary:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "dynamic/n8n",
  "choices": [...],
  "routing_summary": {
    "route_name": "dynamic/n8n",
    "steps": [
      {"step_index": 0, "provider": "cerebras", "model": "gpt-oss-120b", "success": true, "duration_ms": 1234},
      {"step_index": 1, "provider": "openrouter", "model": "nvidia/nemotron-3-nano-30b-a3b:free", "success": false, "duration_ms": 500, "error": "rate limit exceeded"}
    ]
  }
}

The routing_summary field shows which route steps were attempted, which succeeded, and the timing in milliseconds for each step. Failed steps include the error message. Included in both successful responses and error responses when all steps fail.

Service Management

sudo systemctl start ai-gateway     # Start service
sudo systemctl stop ai-gateway      # Stop service
sudo systemctl enable ai-gateway    # Enable auto-start
sudo systemctl status ai-gateway    # Check status
sudo journalctl -u ai-gateway -f    # View logs

Security & Logging

Security: API key redaction, non-root execution, restrictive file permissions (600), TLS recommended
Logging: Structured JSON logs with request/response summaries, automatic key redaction
Error Handling: Sequential provider fallback on any error, detailed error messages with provider info

Telemetry

The gateway can send OpenTelemetry traces and logger events directly to any OTLP/HTTP-compliant collector. Configure the following environment variables to point at your observability backend (Grafana Cloud, Alloy, Tempo, or other OTLP destination):

OTLP_ENDPOINT: Full URL to the OTLP HTTP endpoint. Supports both host:port and full URLs like https://otlp-gateway.example.com/otlp.
OTLP_API_KEY: API key or token.
- For Grafana Cloud: You can use a standard glc_ Access Policy Token. The gateway automatically extracts the Instance ID from the token and handles the required Basic authentication (instanceID:apiKey).
- For other collectors: It uses the provided key for Basic authentication (apiKey:).
OTEL_SERVICE_NAME (or OTLP_SERVICE_NAME): Optional. The service name (ai-gateway) used to group spans/logs.
OTEL_RESOURCE_ATTRIBUTES (or OTLP_RESOURCE_ATTRIBUTES): Optional. Comma-separated key=value pairs added to each resource (e.g., deployment.environment=production).
OTLP_HEADERS (optional): Optional. Extra headers in Key=Value CSV format.

How it works

The gateway uses the OTLP HTTP exporter for maximum compatibility (bypassing gRPC/ALPN issues). It automatically handles the /v1/traces signal path, ensuring that if you provide a base URL (like Grafana's /otlp), it still reaches the correct endpoint.

Contributing

Bug reports and pull requests are welcome. See CONTRIBUTING.md for setup, tests, and PR expectations. For security-sensitive issues, use the process in SECURITY.md instead of a public issue.

Reporting vulnerabilities

See SECURITY.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.cursor		.cursor
.github/workflows		.github/workflows
.vscode		.vscode
app		app
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
ai-gateway.service		ai-gateway.service
config.yaml.example		config.yaml.example
docker-compose.yml		docker-compose.yml
ops.sh		ops.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Gateway

Why Choose AI Gateway?

Key Benefits

Installation

Quick Setup

Deployment Options

Docker Installation

CI/CD (GitHub Actions + GHCR)

Configuration

API Endpoints

Authentication

Health Check

Diagnostics

List Models

Chat Completions

Service Management

Security & Logging

Telemetry

How it works

Contributing

Reporting vulnerabilities

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Gateway

Why Choose AI Gateway?

Key Benefits

Installation

Quick Setup

Deployment Options

Docker Installation

CI/CD (GitHub Actions + GHCR)

Configuration

API Endpoints

Authentication

Health Check

Diagnostics

List Models

Chat Completions

Service Management

Security & Logging

Telemetry

How it works

Contributing

Reporting vulnerabilities

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages