sidebar-title	Local Installation
description	Install and run Dynamo on a local machine or VM with containers or PyPI

Local Installation

This guide walks through installing and running Dynamo on a local machine or VM with one or more GPUs. By the end, you'll have a working OpenAI-compatible endpoint serving a model.

For production multi-node clusters, see the Kubernetes Deployment Guide. To build from source for development, see Building from Source.

System Requirements

Requirement	Supported
GPU	NVIDIA Ampere, Ada Lovelace, Hopper, Blackwell
OS	Ubuntu 22.04, Ubuntu 24.04
Architecture	x86_64, ARM64 (ARM64 requires Ubuntu 24.04)
CUDA	12.9+ or 13.0+ (B300/GB300 require CUDA 13)
Python	3.10, 3.12
Driver	575.51.03+ (CUDA 12) or 580.00.03+ (CUDA 13)

TensorRT-LLM does not support Python 3.11.

For the full compatibility matrix including backend framework versions, see the Support Matrix.

Install Dynamo

Option A: Containers (Recommended)

Containers have all dependencies pre-installed. No setup required.

# SGLang
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0

# TensorRT-LLM
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0

# vLLM
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0

To run frontend and worker in the same container, either:

Run processes in background with & (see Run Dynamo section below), or
Open a second terminal and use docker exec -it <container_id> bash

See Release Artifacts for available versions and backend guides for run instructions: SGLang | TensorRT-LLM | vLLM

Option B: Install from PyPI

# Install uv (recommended Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment
uv venv venv
source venv/bin/activate
uv pip install pip

Install system dependencies and the Dynamo wheel for your chosen backend:

SGLang

sudo apt install python3-dev
uv pip install --prerelease=allow "ai-dynamo[sglang]"

For CUDA 13 (B300/GB300), the container is recommended. See SGLang install docs for details.

TensorRT-LLM

sudo apt install python3-dev
pip install torch==2.9.0 torchvision --index-url https://download.pytorch.org/whl/cu130
pip install --pre --extra-index-url https://pypi.nvidia.com "ai-dynamo[trtllm]"

TensorRT-LLM requires pip due to a transitive Git URL dependency that uv doesn't resolve. We recommend using the TensorRT-LLM container for broader compatibility. See the TRT-LLM backend guide for details.

vLLM

sudo apt install python3-dev libxcb1
uv pip install --prerelease=allow "ai-dynamo[vllm]"

Run Dynamo

Discovery Backend

Dynamo components discover each other through a shared backend. Two options are available:

Backend	When to Use	Setup
File	Single machine, local development	No setup -- pass `--discovery-backend file` to all components
etcd	Multi-node, production	Requires a running etcd instance (default if no flag is specified)

This guide uses --discovery-backend file. For etcd setup, see Service Discovery.

Verify Installation (Optional)

Verify the CLI is installed and callable:

python3 -m dynamo.frontend --help

If you cloned the repository, you can run additional system checks:

python3 deploy/sanity_check.py

Start the Frontend

# Start the OpenAI compatible frontend (default port is 8000)
python3 -m dynamo.frontend --discovery-backend file

To run in a single terminal (useful in containers), append > logfile.log 2>&1 & to run processes in background:

python3 -m dynamo.frontend --discovery-backend file > dynamo.frontend.log 2>&1 &

Start a Worker

In another terminal (or same terminal if using background mode), start a worker for your chosen backend:

SGLang

python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --discovery-backend file

TensorRT-LLM

python3 -m dynamo.trtllm --model-path Qwen/Qwen3-0.6B --discovery-backend file

The warning Cannot connect to ModelExpress server/transport error. Using direct download. is expected in local deployments and can be safely ignored.

vLLM

python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --discovery-backend file \
  --kv-events-config '{"enable_kv_cache_events": false}'

KV Events Configuration

For dependency-free local development, disable KV event publishing (avoids NATS):

vLLM: Add --kv-events-config '{"enable_kv_cache_events": false}'
SGLang: No flag needed (KV events disabled by default)
TensorRT-LLM: No flag needed (KV events disabled by default)

KV events are disabled by default for all backends. Add --kv-events-config explicitly only when you want KV event publishing enabled.

Test Your Deployment

curl localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "Qwen/Qwen3-0.6B",
       "messages": [{"role": "user", "content": "Hello!"}],
       "max_tokens": 50}'

Troubleshooting

CUDA/driver version mismatch

Run nvidia-smi to check your driver version. Dynamo requires driver 575.51.03+ for CUDA 12 or 580.00.03+ for CUDA 13. B300/GB300 GPUs require CUDA 13. See the Support Matrix for full requirements.

Model doesn't fit on GPU (OOM)

The default model Qwen/Qwen3-0.6B requires ~2GB of GPU memory. Larger models need more VRAM:

Model Size	Approximate VRAM
7B	14-16 GB
13B	26-28 GB
70B	140+ GB (multi-GPU)

Start with a small model and scale up based on your hardware.

Python 3.11 with TensorRT-LLM

TensorRT-LLM does not support Python 3.11. If you see installation failures with TensorRT-LLM, check your Python version with python3 --version. Use Python 3.10 or 3.12 instead.

Container runs but GPU not detected

Ensure you passed --gpus all to docker run. Without this flag, the container won't have access to GPUs:

# Correct
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0

# Wrong -- no GPU access
docker run --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0

Next Steps

Backend Guides -- Backend-specific configuration and features
Disaggregated Serving -- Scale prefill and decode independently
KV Cache Aware Routing -- Smart request routing
Kubernetes Deployment -- Production multi-node deployments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local Installation

System Requirements

Install Dynamo

Option A: Containers (Recommended)

Option B: Install from PyPI

Run Dynamo

Discovery Backend

Verify Installation (Optional)

Start the Frontend

Start a Worker

KV Events Configuration

Test Your Deployment

Troubleshooting

Next Steps

FilesExpand file tree

local-installation.md

Latest commit

History

local-installation.md

File metadata and controls

Local Installation

System Requirements

Install Dynamo

Option A: Containers (Recommended)

Option B: Install from PyPI

Run Dynamo

Discovery Backend

Verify Installation (Optional)

Start the Frontend

Start a Worker

KV Events Configuration

Test Your Deployment

Troubleshooting

Next Steps