Your AI. Your infrastructure. Zero middlemen. Security built for zero trust.
macOS App (signed) β’ CLI β’ Linux systemd
Every prompt you send to cloud AI providers is logged. Stored. Possibly used to train their models.
Private LLM changes the game:
- π 4096-bit RSA, TLS 1.3 β Exceeds typical enterprise standards
- ποΈ HSM-backed key management β Hardware security module, 90-day auto-rotation
- π Aggressive key rotation β Fresh certs every VM boot
- π‘οΈ Zero-trust architecture β CA key never leaves your machine
# macOS: Download, sign GCP once, run `private-llm up`
# CLI: one-liner install, interactive setup, done
# Your tools think it's local Ollama
$ ollama run stewartpark/qwen3.5| Cloud AI Providers | Private LLM | |
|---|---|---|
| Your prompts | Logged, stored, possibly trained on | Never leave your infrastructure, certs auto-rotate |
| Cost | Per token, opaque pricing | GPU hourly, scales to zero |
| Control | Their rates, their uptime, their rules | You own the VM, you set idle timeout |
| Compliance | Their SOC 2, their BAA | Your GCP project, your KMS keys |
- Download latest release
- Sign into GCP (one-time):
gcloud auth application-default login
- Run
upfrom the menu bar β follow interactive prompts
Done. Menu bar icon shows status. No terminal needed.
curl -fsSSL https://raw.githubusercontent.com/stewartpark/private-llm/main/misc/install.sh | shThen:
$ gcloud auth application-default login # one-time
$ private-llm up # interactive setup
$ private-llm # start dashboardTotal time: ~5 min (first boot: 30 min for package installs; subsequent: 3-5 min)
flowchart LR
subgraph "Your Machine"
A[Your Tools<br/>ollama CLI, Cursor, etc.]
B[private-llm CLI<br/>Proxy daemon]
end
subgraph GCP[GCP Cloud]
C{VM Running?}
D[Compute API<br/>Start VM]
E[Secret Manager<br/>Server certs + token]
F[GPU VM<br/>Ollama]
end
A -->|localhost:11434| B
B -->|request| C
C -->|No| B
B -->|1. Detect IP<br/>2. Open firewall<br/>3. Rotate certs<br/>4. Upload to SM| E
B -->|5. Start VM| D
D --> F
F -->|6. Fetch certs at boot| E
F -->|7. Boot Ollama| B
C -->|Yes| F
F -->|response| B
B -->|SSE stream| A
style A fill:#22c55e,stroke:#166534
style B fill:#3b82f6,stroke:#1e40af,color:white
style F fill:#8b5cf6,stroke:#6b21a8,color:white
style E fill:#16a34a,stroke:#14532d,color:white
- Install (app or CLI) β CA private key stays on your machine
- Provision β
private-llm upcreates VPC, KMS HSM key, shielded VM - Run β
private-llmstarts proxy with live TUI dashboard - Use β Any Ollama tool works (localhost:11434)
- Scale to zero β VM auto-stops after 5 min idle ($0 when not in use)
graph TB
subgraph "Your Machine"
A[CA Private Key<br/>~/.config/private-llm/certs/ca.key]
B[Client Cert + Key<br/>~/.config/private-llm/certs/]
P[private-llm Proxy<br/>localhost:11434]
end
subgraph GCP[GCP Cloud]
subgraph "Key Management"
C[KMS HSM Key<br/>Auto-rotate 90 days]
D[Secret Manager<br/>Server certs + bearer token]
end
subgraph "Compute"
E[Shielded VM<br/>Secure Boot + vTPM]
end
end
subgraph "Defense Layers"
F[mTLS Validation<br/>4096-bit RSA, TLS 1.3]
G[Fingerprint Pinning<br/>SHA-256 in memory]
H[Dynamic Firewall<br/>Your IP only]
end
A -.->|never leaves your machine| B
B -.->|loads | P
P ==>|mTLS request | E
C -->|encrypts| D
D -->|boot retrieval| E
E -->|every request| F
F -->|verifies| G
H -->|IP-locked access| E
style A fill:#dc2626,stroke:#991b1b,color:white
style B fill:#ef4444,stroke:#991b1b
style P fill:#3b82f6,stroke:#1e40af,color:white
style C fill:#16a34a,stroke:#14532d,color:white
style D fill:#16a34a,stroke:#14532d,color:white
style F fill:#f59e0b,stroke:#92400e
style G fill:#f59e0b,stroke:#92400e
style H fill:#f59e0b,stroke:#92400e
Zero-trust model: CA key isolation means GCP cannot forge certificates or intercept traffic (only your machine can sign certs). Fingerprint pinning detects MITM attacks. Firewall rule deleted when you quit.
| Type | GPU | VRAM | Best For | ~$/hr |
|---|---|---|---|---|
g2-standard-4 |
L4 | 24GB | 7B-13B models | 0.25 |
g4-standard-48 |
RTX 6000 | 96GB | 70B+ models (default) | 1.80 |
a2-standard-12 |
A100 | 40GB | Legacy | 0.50 |
a3-standard-8 |
H100 | 80GB | Cutting-edge | 2.50 |
Monthly cost (g2-standard-4): $18 (always off) β $28 (40 hrs) β $58 (160 hrs) β $200 (24/7)
Running private-llm opens a live TUI with real-time stats:
Any Ollama-compatible tool:
- CLI:
ollama run llama3.2 - Agents: opencode, Aider, Codex CLI, Claude Code (via
ollama launch) - IDEs: Cursor, VS Code + Ollama extensions
- Custom: OpenAI API compatible (just change
base_urltohttp://localhost:11434)
private-llm up # Provision infrastructure
private-llm down # Destroy infrastructure
private-llm # Start dashboard (proxy runs here)
private-llm rotate-mtls-ca # Emergency: rotate all certsTUI Controls: q quit | r restart | R reset (recreate) | S toggle VM
Config: ~/.config/private-llm/agent.json (see CONFIG.md for all options)
Docs: AGENTS.md β architecture & design | SECURITY.md β threat model & controls | Linux packaging
PolyForm Noncommercial 1.0.0 β Free for personal/internal use. Not for SaaS or resale.
