pw-router

A minimal, auditable LLM gateway for regulated environments.

MIT License | Python 3.12 / FastAPI | 4 dependencies | ~1,000 lines of core

Built by Protocol Wealth LLC, an SEC-registered investment adviser that actually routes client-adjacent AI workloads through this.

The selling point isn't speed benchmarks. It's that a compliance officer can read the entire codebase in an afternoon.

Why This Exists

In March 2025, LiteLLM suffered a supply-chain attack via a compromised dependency. LiteLLM has 50+ transitive dependencies, 800+ open GitHub issues, and a codebase that's impossible for a compliance team to audit.

If you're routing AI workloads in a regulated environment — financial services, healthcare, legal — you need a gateway you can actually read, understand, and sign off on. pw-router is that gateway.

What pw-router does:

OpenAI-compatible API in front of multiple LLM providers
Circuit breaker per model with automatic fallback chains
Tag-based routing (e.g., PII-flagged requests → self-hosted models only)
Pluggable middleware for compliance hooks (PII scanning, audit logging, RBAC)
YAML config with env var expansion — no database required

What pw-router does NOT do:

Store any data (stateless — logs go to stdout for your log aggregator)
Make compliance decisions (that's your middleware plugins' job)
Provide a UI (it's an API gateway, not a platform)
Phone home, collect telemetry, or run analytics

How It Works

Client (OpenAI SDK)
    │
    ▼
┌─────────────────────────────────────────────────────────┐
│                      pw-router                          │
│                                                         │
│  1. Auth ─── Validate API key, resolve client identity  │
│       │                                                 │
│  2. Pre-request middleware ─── PII scan, tag, log       │
│       │                                                 │
│  3. Router engine                                       │
│       ├── Match: explicit model / tag rule / default    │
│       ├── Circuit breaker: skip unhealthy providers     │
│       └── Fallback chain: try next if primary is down   │
│       │                                                 │
│  4. Provider adapter ─── Translate to provider format   │
│       │                                                 │
│  5. Post-response middleware ─── Audit, PII scan output │
│       │                                                 │
│  6. Return OpenAI-format response                       │
└─────────────────────────────────────────────────────────┘
    │
    ▼
LLM Providers (Anthropic, OpenAI, vLLM/RunPod, Ollama, ...)

The entire core is 8 files:

pw_router/
├── server.py      # FastAPI app, routes, auth         (263 lines)
├── providers.py   # OpenAI, Anthropic, vLLM adapters  (306 lines)
├── router.py      # Model selection, circuit breaker   (175 lines)
├── middleware.py   # Pre/post hook system, plugin loader (93 lines)
├── config.py      # YAML loader, env var expansion      (77 lines)
├── health.py      # Background health checks            (51 lines)
├── models.py      # Shared exceptions                   (29 lines)
└── __main__.py    # CLI entry point                     (33 lines)

Quick Start

Install

pip install pw-router

Or from source:

git clone https://github.com/Protocol-Wealth/pw-router.git
cd pw-router
pip install -e ".[dev]"

Configure

cp config.example.yaml config.yaml

Edit config.yaml with your provider API keys and models:

server:
  host: "0.0.0.0"
  port: 8100
  api_keys:
    - key: "${PW_ROUTER_API_KEY_1}"
      name: "my-app"
      allowed_models: ["*"]

models:
  claude-sonnet:
    provider: anthropic
    model: "claude-sonnet-4-20250514"
    api_key: "${ANTHROPIC_API_KEY}"
    timeout_seconds: 120
    tags: ["external", "reasoning"]

  local-llama:
    provider: vllm
    model: "meta-llama/Llama-3.1-70B-Instruct"
    api_key: "${RUNPOD_API_KEY}"
    base_url: "https://api.runpod.ai/v2/${RUNPOD_ENDPOINT_ID}/openai/v1"
    timeout_seconds: 90
    tags: ["self-hosted", "client-safe"]

routing:
  default_model: "claude-sonnet"
  fallback_chains:
    reasoning: ["claude-sonnet", "local-llama"]
    client-safe: ["local-llama"]
  rules:
    - match:
        tag: "client-data"
      route_to_chain: "client-safe"

health:
  check_interval_seconds: 30
  unhealthy_threshold: 3
  healthy_threshold: 1
  check_timeout_seconds: 5

middleware:
  pre_request: []
  post_response: []

Environment variables referenced as ${VAR_NAME} in the YAML are expanded at load time.

Run

# Set your env vars
export ANTHROPIC_API_KEY=sk-ant-...
export PW_ROUTER_API_KEY_1=your-secret-key

# Start the router
pw-router --config config.yaml

# Or with uvicorn directly
uvicorn pw_router.server:app --host 0.0.0.0 --port 8100

Use

Any OpenAI SDK client works out of the box:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8100/v1",
    api_key="your-secret-key",
)

# Non-streaming
response = client.chat.completions.create(
    model="claude-sonnet",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

# Streaming
for chunk in client.chat.completions.create(
    model="claude-sonnet",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
):
    print(chunk.choices[0].delta.content or "", end="")

API Reference

Method	Path	Description
POST	`/v1/chat/completions`	Chat completions (streaming + non-streaming)
GET	`/v1/models`	List available models (filtered by API key)
GET	`/health`	Router health + per-model circuit breaker status

All endpoints accept Authorization: Bearer <your-api-key> (except /health).

POST /v1/chat/completions

Standard OpenAI chat completions format. Supports stream: true.

GET /v1/models

Returns models the authenticated client is allowed to use:

{
  "object": "list",
  "data": [
    {"id": "claude-sonnet", "object": "model", "owned_by": "anthropic"},
    {"id": "local-llama", "object": "model", "owned_by": "vllm"}
  ]
}

GET /health

No auth required. Returns per-model circuit breaker state:

{
  "status": "healthy",
  "version": "0.1.0",
  "models": {
    "claude-sonnet": {"status": "healthy", "circuit": "closed"},
    "local-llama": {"status": "unhealthy", "circuit": "open"}
  }
}

Configuration Reference

See config.example.yaml for a complete template.

API Key Auth

Each API key has a name (for audit logging) and a model allowlist:

server:
  api_keys:
    - key: "${KEY_1}"
      name: "backend"
      allowed_models: ["*"]           # Wildcard — all models
    - key: "${KEY_2}"
      name: "frontend"
      allowed_models: ["local-*"]     # Glob — only self-hosted

Routing

Explicit model: Client specifies "model": "claude-sonnet" in the request. Routed directly if client is allowed.

Tag-based rules: Middleware plugins add tags (e.g., "client-data"), which match routing rules:

routing:
  rules:
    - match:
        tag: "client-data"
      route_to_chain: "client-safe"   # Only self-hosted models

Fallback chains: If the selected model's circuit breaker is open, the router walks the fallback chain until it finds a healthy model:

routing:
  fallback_chains:
    reasoning: ["claude-sonnet", "gpt-4o", "local-llama"]

Circuit Breaker

Per-model, in-memory. Resets on restart (safe default).

CLOSED  ──(3 consecutive failures)──▶  OPEN
OPEN    ──(30s cooldown)──▶            HALF_OPEN
HALF_OPEN ──(1 success)──▶            CLOSED
HALF_OPEN ──(1 failure)──▶            OPEN

Writing Plugins

Plugins are Python modules with pre_request and/or post_response async functions.

# plugins/my_plugin.py
from pw_router.middleware import MiddlewareContext, MiddlewareResult

async def pre_request(ctx: MiddlewareContext) -> MiddlewareResult:
    """Called before the request is sent to the provider."""
    # ctx.request_body — mutable OpenAI-format request dict
    # ctx.client_name  — authenticated client identity
    # ctx.tags         — mutable set; add tags to influence routing
    # ctx.metadata     — mutable dict; pass data to post_response
    # ctx.config       — plugin-specific config from YAML

    if contains_pii(ctx.request_body):
        ctx.tags.add("client-data")  # Route to self-hosted only

    return MiddlewareResult(allow=True)

async def post_response(ctx: MiddlewareContext) -> MiddlewareResult:
    """Called after the response is received."""
    # ctx.response_body — OpenAI-format response dict
    # ctx.model_used    — actual model name
    # ctx.latency_ms    — request latency
    # ctx.provider      — provider name

    log_audit_event(ctx)
    return MiddlewareResult(allow=True)

Register in config.yaml:

middleware:
  pre_request:
    - plugin: "plugins.my_plugin"
      config:
        some_setting: "value"
  post_response:
    - plugin: "plugins.my_plugin"

To block a request, return MiddlewareResult(allow=False, error_message="Reason", status_code=403).

See plugins/example_redact.py and plugins/example_logger.py for working examples. Full plugin guide: docs/plugins.md.

Provider Adapters

Provider	Adapter	Notes
OpenAI	`openai`	Native format pass-through
Anthropic	`anthropic`	Translates OpenAI ↔ Anthropic Messages API
vLLM / RunPod	`vllm`	OpenAI-compatible with custom `base_url`

Each adapter implements:

chat_completion(body, model_config, stream=False) — send request, return OpenAI-format response
health_check(model_config) — return True if endpoint is responsive

Adding a provider is ~30-50 lines. See docs/architecture.md.

Deployment

Docker

docker build -t pw-router .
docker run -p 8100:8100 \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -e PW_ROUTER_API_KEY_1=your-key \
  -v $(pwd)/config.yaml:/app/config.yaml \
  pw-router

Fly.io

cp fly.toml.example fly.toml
# Edit fly.toml with your app name
fly launch
fly secrets set ANTHROPIC_API_KEY=sk-ant-... PW_ROUTER_API_KEY_1=your-key
fly deploy

See docs/deployment.md for detailed instructions.

Comparison

	pw-router	LiteLLM	Bifrost
Language	Python	Python	Go
License	MIT	MIT	Apache 2.0
Core LOC	~1,000	~50,000+	~15,000+
Dependencies	4	50+	~10
Provider support	3 built-in	100+	20+
Designed for	Auditability	Breadth	Performance
Circuit breaker	Yes	Yes	Yes
Fallback chains	Yes	Yes	Yes
Middleware hooks	Pre/post	Callbacks	Plugins
OpenAI-compatible	Yes	Yes	Yes

pw-router deliberately trades breadth for auditability. If you need 100 provider integrations, use LiteLLM. If you need to hand your codebase to a compliance officer and have them understand it by lunch, use pw-router.

Roadmap

v0.1.0 (current)

/v1/chat/completions (streaming + non-streaming)
/v1/models and /health
OpenAI, Anthropic, and vLLM provider adapters
YAML config with env var expansion
API key auth with per-key model allowlists
Circuit breaker per model with fallback chains
Pluggable pre/post middleware hooks
Background health checks
Full test suite

v0.2.0 (next)

/v1/completions and /v1/embeddings
/metrics endpoint (request counts, latency percentiles)
Ollama adapter
Custom HTTP adapter (generic)
Token counting / budget limits per API key
Response caching

Contributing

pw-router is intentionally minimal. Before adding a feature, ask:

Does this belong in core or a plugin? Anything opinionated about compliance, auth, logging, or data handling should be a plugin.
Does this increase the dependency count? Strong bias against new dependencies.
Can a compliance officer still read the core in an afternoon? If a PR pushes core past ~1,500 lines, it's probably doing too much.

See CONTRIBUTING.md for development setup and guidelines.

Security

Report vulnerabilities via SECURITY.md. Do not open a public issue.

License

MIT. See LICENSE.

Architectural patterns (circuit breaker, fallback chains, health-check loops) are informed by Bifrost (Apache 2.0, Maxim AI). No code was copied; patterns were reimplemented in Python/FastAPI. Per Apache 2.0 Section 4, this notice serves as attribution.

Built by Protocol Wealth LLC — SEC-registered investment adviser (CRD #335298).

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
docs		docs
plugins		plugins
pw_router		pw_router
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
config.example.yaml		config.example.yaml
fly.toml.example		fly.toml.example
llms-full.txt		llms-full.txt
llms.txt		llms.txt
pyproject.toml		pyproject.toml
requirements.lock		requirements.lock

Folders and files

Latest commit

History

Repository files navigation

pw-router

Why This Exists

How It Works

Quick Start

Install

Configure

Run

Use

API Reference

POST /v1/chat/completions

GET /v1/models

GET /health

Configuration Reference

API Key Auth

Routing

Circuit Breaker

Writing Plugins

Provider Adapters

Deployment

Docker

Fly.io

Comparison

Roadmap

v0.1.0 (current)

v0.2.0 (next)

Contributing

Security

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages