Minimal FastAPI service that connects to Ollama and ships with a tiny web playground (model selector, temperature, top-p, max tokens, streaming).
Runs with one command via Docker Compose.
- FastAPI + Uvicorn — API & simple HTML playground
- Ollama — local LLMs (phi3, qwen2.5-coder, llama3, deepseek-r1, …)
- Docker / Compose — reproducible environment
- dotenv — config via
.env - pytest — optional tests (if you add them)
- GitHub Actions — optional CI/CD for container images
GET /models— proxies Ollama/api/tags(model list)GET/POST /chat— non-stream responseGET /stream— token streaming (text/plain)GET /playground— minimal web UIGET /health— connectivity check- (Optional)
GET /warmup?model=phi3:mini— preload/keep model warm
.
├── app/
│ ├── __init__.py
│ └── main.py
├── .env # local (do NOT commit)
├── .env.example # template to share
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.mdCopy .env.example → .env and adjust as needed.
# If Ollama runs on the host (Windows/macOS)
OLLAMA_URL=http://host.docker.internal:11434
# Linux: either add extra_hosts in compose or use the docker bridge IP
# OLLAMA_URL=http://172.17.0.1:11434
DEFAULT_MODEL=phi3:mini
ALLOWED_MODELS=phi3:mini,qwen2.5-coder:7b,llama3.1:latest,deepseek-r1:7b
# Optional tuning
READ_TIMEOUT=300
OLLAMA_KEEP_ALIVE=30mStart Ollama and pull at least one model:
ollama serve
ollama pull phi3:miniThen build and run:
docker compose up -d --buildAccess endpoints:
- 🎨 Playground → http://127.0.0.1:8000/playground
- 📘 Docs → http://127.0.0.1:8000/docs
- 💚 Health → http://127.0.0.1:8000/health
If you want both Ollama + API in Docker:
version: "3.9"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
restart: unless-stopped
ollama_api:
build: .
container_name: ollama_api
ports:
- "8000:8000"
environment:
- OLLAMA_URL=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
volumes:
ollama_data:Run everything:
docker compose up -d --build| Method | Endpoint | Description |
|---|---|---|
| GET | /models |
List available models |
| GET / POST | /chat |
Generate completions |
| GET | /stream |
Stream tokens in real-time |
| GET | /playground |
Web UI |
| GET | /health |
Health check |
| GET | /warmup?model=phi3:mini |
Warm up model |
Example JSON for /chat:
{
"prompt": "hello",
"model": "phi3:mini",
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 256
}- Use
/streamfor faster perceived responses - Warm up models with
GET /warmup?model=phi3:mini - Lightweight models (
phi3:mini,qwen2.5-coder:7b) respond faster
502 from /health — verify OLLAMA_URL:
curl http://<host>:11434/api/tagsLinux host fix:
Add in docker-compose.yml:
extra_hosts:
- "host.docker.internal:host-gateway"Then keep:
OLLAMA_URL=http://host.docker.internal:11434No models listed? — run:
ollama pull phi3:mini__pycache__/
*.pyc
*.log
.venv/
.env
.git
.gitignore
tests/
MIT License
Copyright (c) 2025 Evgenii Matveev
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
[standard MIT text continues...]
- README.md complete
- .env.example included
- .dockerignore added
- LICENSE added
- Optional CI/CD workflow ready
💻 GitHub Repository: Evgenii Matveev
🌐 Portfolio: Data Science Portfolio
📌 LinkedIn: Evgenii Matveev