Local Llama Stack Setup Guide

This guide will walk you through setting up and running a Llama Stack server with Ollama and Podman.

1. Prerequisites

Ensure you have the following installed:

Podman (Install Podman)
Python 3.10+
pip (Install pip)
Ollama (Install Ollama)

Verify installation:

podman --version
python3 --version
pip --version
ollama --version

2. Start Ollama

Before running Llama Stack, start the Ollama server with:

ollama run llama3.2:3b-instruct-fp16 --keepalive 60m

This ensures the model stays loaded in memory for 60 minutes.

3. Set Up Environment Variables

Set up the required environment variables:

export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
export LLAMA_STACK_PORT=8321

4. Run Llama Stack Server with Podman

Pull the required image:

podman pull docker.io/llamastack/distribution-ollama

Before executing the next command, make sure to create a local directory to mount into the container’s file system.

mkdir -p ~/.llama

Now run the server using:

podman run -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ~/.llama:/root/.llama \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env OLLAMA_URL=http://host.containers.internal:11434 \
  llamastack/distribution-ollama \
  --port $LLAMA_STACK_PORT

If needed, create and use a network:

podman network create llama-net
podman run --privileged --network llama-net -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  llamastack/distribution-ollama \
  --port $LLAMA_STACK_PORT

Verify the container is running:

podman ps

5. Set Up Python Environment

Create a virtual environment using uv and install required libraries:

pip install uv
uv sync
source .venv/bin/activate # macOS/Linux
# On Windows: llama-stack-demo\Scripts\activate

Verify installation:

pip list | grep llama-stack-client

6. Configure the Client

Set up the client to connect to the Llama Stack server:

llama-stack-client configure --endpoint http://localhost:$LLAMA_STACK_PORT

List available models:

llama-stack-client models list

7. Quickly setting up your environment

Now that your environemnt has gone through the initial set up, you can quickly return to a running ollama and llama stack server using the setup_local command available in the Makefile.

make setup_local

8. Debugging Common Issues

Check if Podman is Running:

podman ps

Ensure the Virtual Environment is Activated:

source llama-stack-demo/bin/activate

Reinstall the Client if Necessary:

pip uninstall llama-stack-client
pip install llama-stack-client

Test Importing the Client in Python:

python -c "from llama_stack_client import LlamaStackClient; print(LlamaStackClient)"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local Llama Stack Setup Guide

1. Prerequisites

2. Start Ollama

3. Set Up Environment Variables

4. Run Llama Stack Server with Podman

5. Set Up Python Environment

6. Configure the Client

7. Quickly setting up your environment

8. Debugging Common Issues

FilesExpand file tree

local_setup_guide.md

Latest commit

History

local_setup_guide.md

File metadata and controls

Local Llama Stack Setup Guide

1. Prerequisites

2. Start Ollama

3. Set Up Environment Variables

4. Run Llama Stack Server with Podman

5. Set Up Python Environment

6. Configure the Client

7. Quickly setting up your environment

8. Debugging Common Issues