🧠 FastAPI + Ollama Playground — Dockerized LLM Inference

Minimal FastAPI service that connects to Ollama and ships with a tiny web playground (model selector, temperature, top-p, max tokens, streaming).
Runs with one command via Docker Compose.

📦 Tech Stack

FastAPI + Uvicorn — API & simple HTML playground
Ollama — local LLMs (phi3, qwen2.5-coder, llama3, deepseek-r1, …)
Docker / Compose — reproducible environment
dotenv — config via .env
pytest — optional tests (if you add them)
GitHub Actions — optional CI/CD for container images

✨ Features

GET /models — proxies Ollama /api/tags (model list)
GET/POST /chat — non-stream response
GET /stream — token streaming (text/plain)
GET /playground — minimal web UI
GET /health — connectivity check
(Optional) GET /warmup?model=phi3:mini — preload/keep model warm

🗂️ Project Structure

.
├── app/
│   ├── __init__.py
│   └── main.py
├── .env                # local (do NOT commit)
├── .env.example        # template to share
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md

⚙️ Environment Setup

Copy .env.example → .env and adjust as needed.

# If Ollama runs on the host (Windows/macOS)
OLLAMA_URL=http://host.docker.internal:11434

# Linux: either add extra_hosts in compose or use the docker bridge IP
# OLLAMA_URL=http://172.17.0.1:11434

DEFAULT_MODEL=phi3:mini
ALLOWED_MODELS=phi3:mini,qwen2.5-coder:7b,llama3.1:latest,deepseek-r1:7b

# Optional tuning
READ_TIMEOUT=300
OLLAMA_KEEP_ALIVE=30m

🚀 Quick Start

🧩 A) Using Your Host’s Ollama

Start Ollama and pull at least one model:

ollama serve
ollama pull phi3:mini

Then build and run:

docker compose up -d --build

Access endpoints:

🎨 Playground → http://127.0.0.1:8000/playground
📘 Docs → http://127.0.0.1:8000/docs
💚 Health → http://127.0.0.1:8000/health

🐳 B) Self-contained (Compose Runs Ollama Too)

If you want both Ollama + API in Docker:

version: "3.9"
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped

  ollama_api:
    build: .
    container_name: ollama_api
    ports:
      - "8000:8000"
    environment:
      - OLLAMA_URL=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

volumes:
  ollama_data:

Run everything:

docker compose up -d --build

🔌 Endpoints (Cheat Sheet)

Method	Endpoint	Description
GET	`/models`	List available models
GET / POST	`/chat`	Generate completions
GET	`/stream`	Stream tokens in real-time
GET	`/playground`	Web UI
GET	`/health`	Health check
GET	`/warmup?model=phi3:mini`	Warm up model

Example JSON for /chat:

{
  "prompt": "hello",
  "model": "phi3:mini",
  "temperature": 0.7,
  "top_p": 0.9,
  "max_tokens": 256
}

⚡ Performance Tips

Use /stream for faster perceived responses
Warm up models with GET /warmup?model=phi3:mini
Lightweight models (phi3:mini, qwen2.5-coder:7b) respond faster

🧰 Troubleshooting

502 from /health — verify OLLAMA_URL:

curl http://<host>:11434/api/tags

Linux host fix:

Add in docker-compose.yml:

extra_hosts:
  - "host.docker.internal:host-gateway"

Then keep:

OLLAMA_URL=http://host.docker.internal:11434

No models listed? — run:

ollama pull phi3:mini

🧾 .dockerignore

__pycache__/
*.pyc
*.log
.venv/
.env
.git
.gitignore
tests/

🪪 License (MIT)

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

[standard MIT text continues...]

✅ Quick Push Checklist

README.md complete
.env.example included
.dockerignore added
LICENSE added
Optional CI/CD workflow ready

📢 Stay Connected!

💻 GitHub Repository: Evgenii Matveev
🌐 Portfolio: Data Science Portfolio
📌 LinkedIn: Evgenii Matveev

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 FastAPI + Ollama Playground — Dockerized LLM Inference

📦 Tech Stack

✨ Features

🗂️ Project Structure

⚙️ Environment Setup

🚀 Quick Start

🧩 A) Using Your Host’s Ollama

🐳 B) Self-contained (Compose Runs Ollama Too)

🔌 Endpoints (Cheat Sheet)

⚡ Performance Tips

🧰 Troubleshooting

🧾 .dockerignore

🪪 License (MIT)

✅ Quick Push Checklist

📢 Stay Connected!

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
app		app
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

evgeniimatveev/mlops_ollama_env

Folders and files

Latest commit

History

Repository files navigation

🧠 FastAPI + Ollama Playground — Dockerized LLM Inference

📦 Tech Stack

✨ Features

🗂️ Project Structure

⚙️ Environment Setup

🚀 Quick Start

🧩 A) Using Your Host’s Ollama

🐳 B) Self-contained (Compose Runs Ollama Too)

🔌 Endpoints (Cheat Sheet)

⚡ Performance Tips

🧰 Troubleshooting

🧾 .dockerignore

🪪 License (MIT)

✅ Quick Push Checklist

📢 Stay Connected!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages