A ready-to-use development environment template for Python projects with integrated LLM capabilities via Ollama.
- Pre-configured VS Code Dev Container setup with Docker Compose
- Python 3.10 environment with automatic dependency installation
- Integrated Ollama for LLM inference
- Example code for interacting with Ollama models
- Docker installed
- VS Code with the Dev Containers extension
- (Optional) GPU with CUDA or ROCm support for GPU-accelerated inference
- NVIDIA GPUs: NVIDIA Container Toolkit must be installed on the host system
- AMD GPUs: ROCm-compatible AMD GPU with ROCm drivers installed on the host system
- Docker must be configured to use the appropriate GPU runtime
- Click "Use this template" to create a new repository from this template
- Clone your new repository
- Open in VS Code
- When prompted, click "Reopen in Container"
- Wait for the container to build and initialize (this will pull required images and install dependencies)
.devcontainer/- Dev Container configurationsrc/- Python source code modulesexamples/- Example scripts for embeddings, response generation, and image detectionrequirements.txt- Python dependencies
The template comes with a pre-configured Ollama service and a Python client for interacting with it.
This template is configured to automatically use GPU acceleration (NVIDIA or AMD) if available. The docker-compose.yml file includes GPU resource reservations that will enable GPU support when:
For NVIDIA GPUs:
- Your system has an NVIDIA GPU
- NVIDIA Container Toolkit is installed on the host
- Docker is configured to use the NVIDIA runtime
For AMD GPUs:
- Your system has a ROCm-compatible AMD GPU
- ROCm drivers are installed on the host
- Docker has access to
/dev/kfdand/dev/dridevices
If no GPU is available, Ollama will automatically fall back to CPU inference.
To check if your setup is using GPU acceleration, run:
python examples/check_gpu.pyThis script will:
- Test inference performance
- Display tokens/second metric (GPU typically >50 tokens/sec, CPU typically <20 tokens/sec)
- Provide guidance on whether GPU acceleration is active
You can also check GPU status directly:
# For NVIDIA GPUs - Check if GPU is accessible to the Ollama container
docker exec ollama nvidia-smi
# For AMD GPUs - Check ROCm GPU status
docker exec ollama rocm-smi
# View Ollama logs (GPU usage is logged during model loading)
docker logs ollamaIf you have an NVIDIA GPU but GPU acceleration isn't working:
# Add the NVIDIA Container Toolkit repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Install the toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Restart Docker
sudo systemctl restart dockerAfter installation, rebuild the dev container for changes to take effect.
If you have an AMD GPU but GPU acceleration isn't working:
# Check if your AMD GPU is supported
# Visit: https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html
# Install ROCm (Ubuntu/Debian)
# For Ubuntu 22.04
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_latest_all.deb
sudo apt-get install ./amdgpu-install_latest_all.deb
# Install ROCm components
sudo amdgpu-install --usecase=rocm --no-dkms
# Add user to render and video groups
sudo usermod -a -G render,video $USER
# Restart system for changes to take effect
sudo rebootAfter installation, rebuild the dev container for changes to take effect.
Note: AMD GPU support requires ROCm 5.7 or later. Check the official ROCm documentation for your specific GPU model and OS.
By default, the template is configured to use gemma3:1b. You can use any model from the Ollama Model Library.
# Import the Llama client
from src.llama import llama
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")
MODEL = "llama3.2:1b" # Find available models here https://ollama.com/library
# Initialize with Ollama host and model
client = llama(OLLAMA_HOST, MODEL)
# Pull the model if not already available
client.check_and_pull_model()
# Generate a response
response = client.generate_response("What is the capital of France?")Add any required packages to requirements.txt and they will be automatically installed when the container starts.
Change the model in the relevant example script (e.g., examples/text_generate_response.py) by modifying the model name. The template will automatically pull the model if it's not already available.
The main.py file has been removed. Example functionalities are now provided as separate Python scripts in the examples/ folder:
text_embedding.py: Example for generating embeddingstext_generate_response.py: Example for generating responsesimage_classification.py: Example for image detectioncheck_gpu.py: Check GPU availability and performance
To run an example, use:
python examples/check_gpu.py # Check GPU status and performance
python examples/text_embedding.py
python examples/text_generate_response.py
python examples/image_classification.pyEach script will initialize the specified Ollama model and perform its respective task.