A self-hosted ChatGPT replacement for your locally running models. Connects to any OpenAI-compatible inference endpoint — vLLM, Ollama, LM Studio, llama.cpp server — and gives you a polished chat UI with no data leaving your machine.
ChatGPT and Claude are great, but sometimes you need:
- Full privacy — every token stays on your hardware
- No rate limits — your GPU, your rules
- Bleeding-edge models — run whatever is on Hugging Face today without waiting for an API
- Tool use with local context — connect MCP servers to give the model access to your files, APIs, and services
ChatLocal gives you the ChatGPT-style experience on top of your own inference stack.
- Real-time streaming — tokens appear as they are generated via WebSocket
- Thinking blocks — collapsible reasoning display for models like Qwen3 that emit
<think>tokens - MCP tool integration — connect any MCP server (stdio transport); tools run automatically and appear inline as collapsible cards
- Session management — persistent chat history in SQLite, auto-named after the first exchange
- Full-text search — FTS5-powered search across all messages in the sidebar
- Vision support — attach images via paste, drag-and-drop, or file picker
- Context compression — automatically summarizes old messages when approaching the model's context limit
- Markdown rendering — tables, code blocks with syntax highlighting and one-click copy, links
- Export — download any session as a Markdown file
- Settings — configurable system prompt, temperature, top-p, max tokens
- Node.js 20+
- An OpenAI-compatible inference server (vLLM, Ollama, LM Studio, etc.) running locally or on your network
npm installCreate a .env file:
VLLM_BASE_URL=http://localhost:8000 # your inference server base URL
PORT=3000
DATABASE_PATH=./data/chatlocal.db
TAVILY_API_KEY= # optional, for web search via MCPStart the server:
npm startOpen http://localhost:3000.
Tool servers are configured in mcp-servers.json. Each entry is a stdio process that ChatLocal spawns on startup. Environment variables are interpolated automatically.
{
"tavily-remote": {
"command": "npx",
"args": ["-y", "mcp-remote", "https://mcp.tavily.com/mcp/?tavilyApiKey=${TAVILY_API_KEY}"]
}
}Any MCP server that speaks stdio transport works here. Changes require a server restart.
| Backend | VLLM_BASE_URL |
|---|---|
| vLLM | http://localhost:8000 |
| Ollama | http://localhost:11434 |
| LM Studio | http://localhost:1234 |
| llama.cpp server | http://localhost:8080 |
The model list is populated automatically from GET /v1/models. If your backend doesn't serve that endpoint, set the model ID manually in the settings.