Skip to content

Latest commit

 

History

History
139 lines (111 loc) · 3.02 KB

File metadata and controls

139 lines (111 loc) · 3.02 KB

Local Llama Stack Setup Guide

This guide will walk you through setting up and running a Llama Stack server with Ollama and Podman.


1. Prerequisites

Ensure you have the following installed:

Verify installation:

podman --version
python3 --version
pip --version
ollama --version

2. Start Ollama

Before running Llama Stack, start the Ollama server with:

ollama run llama3.2:3b-instruct-fp16 --keepalive 60m

This ensures the model stays loaded in memory for 60 minutes.


3. Set Up Environment Variables

Set up the required environment variables:

export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
export LLAMA_STACK_PORT=8321

4. Run Llama Stack Server with Podman

Pull the required image:

podman pull docker.io/llamastack/distribution-ollama

Before executing the next command, make sure to create a local directory to mount into the container’s file system.

mkdir -p ~/.llama

Now run the server using:

podman run -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ~/.llama:/root/.llama \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env OLLAMA_URL=http://host.containers.internal:11434 \
  llamastack/distribution-ollama \
  --port $LLAMA_STACK_PORT

If needed, create and use a network:

podman network create llama-net
podman run --privileged --network llama-net -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  llamastack/distribution-ollama \
  --port $LLAMA_STACK_PORT

Verify the container is running:

podman ps

5. Set Up Python Environment

Create a virtual environment using uv and install required libraries:

pip install uv
uv sync
source .venv/bin/activate # macOS/Linux
# On Windows: llama-stack-demo\Scripts\activate

Verify installation:

pip list | grep llama-stack-client

6. Configure the Client

Set up the client to connect to the Llama Stack server:

llama-stack-client configure --endpoint http://localhost:$LLAMA_STACK_PORT

List available models:

llama-stack-client models list

7. Quickly setting up your environment

Now that your environemnt has gone through the initial set up, you can quickly return to a running ollama and llama stack server using the setup_local command available in the Makefile.

make setup_local

8. Debugging Common Issues

Check if Podman is Running:

podman ps

Ensure the Virtual Environment is Activated:

source llama-stack-demo/bin/activate

Reinstall the Client if Necessary:

pip uninstall llama-stack-client
pip install llama-stack-client

Test Importing the Client in Python:

python -c "from llama_stack_client import LlamaStackClient; print(LlamaStackClient)"