VLLM Profiling on Radeon

Overview

This project provides tooling to profile vLLM workloads on consumer AMD Radeon GPUs. It orchestrates model runs inside Docker containers, isolates GPUs per run, and captures logs and profiling artifacts for analysis.

⚠️ Status: Early-stage ✅ Platform: Linux only (for now)

Goals

Enable reproducible profiling of vLLM workloads on Radeon GPUs
Support multi-GPU systems with per-run isolation
Preserve model downloads between runs
Collect logs, traces, and performance data in a structured way

Project Structure

`.cache/huggingface/`

This directory is mounted into Docker containers as the Hugging Face cache.

Purpose:

Preserve downloaded models between runs
Avoid repeated downloads across container executions

`.config/`

This directory contains local, user-specific configuration files. It is expected to be customized per machine and not shared verbatim across systems.

`gpus.yaml`

Defines the GPUs available to the orchestrator and their associated settings.

Example:

gpus:
  - name: Radeon RX 7900 XTX
    device: /dev/dri/renderD128
    env:
      GPU_MEM_UTIL: "0.9"

  - name: AMD Radeon RX 9070 XT
    device: /dev/dri/renderD129
    env:
      GPU_MEM_UTIL: "0.85"

  - name: AMD Radeon RX 6700 XT
    device: /dev/dri/renderD130
    disabled: true

Notes:

Used by orchestrator.py to:
- isolate GPUs per model run
- pass GPU-specific environment variables to containers
GPU environment variables take precedence, overriding model-specific env vars
GPUs marked as disabled: true will be ignored

`tokens.yaml`

Provides token-based credentials to Docker containers.

Example:

tokens:
  HF_TOKEN: hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Notes:

Currently only the Hugging Face token is supported
Required only for certain models (e.g. the currently disabled google/embeddinggemma)

`.logs/`

All logs, traces, and profiling outputs are written here.

Directory layout:

.logs/
  └── GPU_NAME/
      └── MODEL_NAME/
          └── <log_and_trace_files>

This structure allows easy comparison:

across GPUs
across models
across multiple runs

`images/`

Holds inference-time image data used by multimodal models.

Notes:

Empty by default in the repository
You may provide your own images
Image filenames must match what the prompt YAML files expect (see yaml/prompts/)

TODO: rework the way images are handled (repo should provide default images via links)

`scripts/`

Contains all project scripts, split into host-side and container-side logic.

Host-side scripts

Run directly on the host system:

orchestrator.py The main entry point. Coordinates:
- GPU selection
- container execution
- model runs
- log collection
docker_tool.py Docker-related utilities used by the orchestrator
generate_gpu_yaml.sh Helper script to auto-generate a gpus.yaml template

NOTE: This script is currently defunct and you will have to create your own .yaml config based on the one provided in the README and your own setup
Future host utilities will also live here

Container-side scripts

Executed inside Docker containers:

run_model.py Entry point for all model runs:
- sets up environment variables
- launches the appropriate model runner
- captures stdout/stderr and artifacts
Model-specific runner scripts Tailored to individual models or groups of models

`yaml/`

Contains configuration files that define what to run and how to run it.

`models.yaml`

Defines which models should be executed.

`env_vars.yaml`

Specifies environment variables to forward into model runners.

`prompts/`

Contains prompt definitions, grouped by model type.

Notes:

Each model (or model family) has its own prompt YAML
Prompts may optionally reference multimedia inputs (e.g. images)
If a prompt references images, corresponding files must exist in the images/ directory

Notes & Limitations

Linux-only support at this stage
Some models may be temporarily disabled
Image-based prompts require manual population of the images/ directory

Future Work

Broader model coverage
Improved automation and validation
Expanded profiling support
Potential non-Linux support

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
scripts		scripts
yaml		yaml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLLM Profiling on Radeon

Overview

Goals

Project Structure

`.cache/huggingface/`

`.config/`

`gpus.yaml`

`tokens.yaml`

`.logs/`

`images/`

TODO: rework the way images are handled (repo should provide default images via links)

`scripts/`

Host-side scripts

Container-side scripts

`yaml/`

`models.yaml`

`env_vars.yaml`

`prompts/`

Notes & Limitations

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VLLM Profiling on Radeon

Overview

Goals

Project Structure

.cache/huggingface/

.config/

gpus.yaml

tokens.yaml

.logs/

images/

TODO: rework the way images are handled (repo should provide default images via links)

scripts/

Host-side scripts

Container-side scripts

yaml/

models.yaml

env_vars.yaml

prompts/

Notes & Limitations

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`.cache/huggingface/`

`.config/`

`gpus.yaml`

`tokens.yaml`

`.logs/`

`images/`

`scripts/`

`yaml/`

`models.yaml`

`env_vars.yaml`

`prompts/`

Packages