This project provides tooling to profile vLLM workloads on consumer AMD Radeon GPUs. It orchestrates model runs inside Docker containers, isolates GPUs per run, and captures logs and profiling artifacts for analysis.
⚠️ Status: Early-stage ✅ Platform: Linux only (for now)
- Enable reproducible profiling of vLLM workloads on Radeon GPUs
- Support multi-GPU systems with per-run isolation
- Preserve model downloads between runs
- Collect logs, traces, and performance data in a structured way
This directory is mounted into Docker containers as the Hugging Face cache.
Purpose:
- Preserve downloaded models between runs
- Avoid repeated downloads across container executions
This directory contains local, user-specific configuration files. It is expected to be customized per machine and not shared verbatim across systems.
Defines the GPUs available to the orchestrator and their associated settings.
Example:
gpus:
- name: Radeon RX 7900 XTX
device: /dev/dri/renderD128
env:
GPU_MEM_UTIL: "0.9"
- name: AMD Radeon RX 9070 XT
device: /dev/dri/renderD129
env:
GPU_MEM_UTIL: "0.85"
- name: AMD Radeon RX 6700 XT
device: /dev/dri/renderD130
disabled: trueNotes:
-
Used by
orchestrator.pyto:- isolate GPUs per model run
- pass GPU-specific environment variables to containers
-
GPU environment variables take precedence, overriding model-specific env vars
-
GPUs marked as
disabled: truewill be ignored
Provides token-based credentials to Docker containers.
Example:
tokens:
HF_TOKEN: hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxNotes:
- Currently only the Hugging Face token is supported
- Required only for certain models (e.g. the currently disabled
google/embeddinggemma)
All logs, traces, and profiling outputs are written here.
Directory layout:
.logs/
└── GPU_NAME/
└── MODEL_NAME/
└── <log_and_trace_files>
This structure allows easy comparison:
- across GPUs
- across models
- across multiple runs
Holds inference-time image data used by multimodal models.
Notes:
- Empty by default in the repository
- You may provide your own images
- Image filenames must match what the prompt YAML files expect (see
yaml/prompts/)
Contains all project scripts, split into host-side and container-side logic.
Run directly on the host system:
-
orchestrator.pyThe main entry point. Coordinates:- GPU selection
- container execution
- model runs
- log collection
-
docker_tool.pyDocker-related utilities used by the orchestrator -
generate_gpu_yaml.shHelper script to auto-generate agpus.yamltemplateNOTE: This script is currently defunct and you will have to create your own .yaml config based on the one provided in the README and your own setup
-
Future host utilities will also live here
Executed inside Docker containers:
-
run_model.pyEntry point for all model runs:- sets up environment variables
- launches the appropriate model runner
- captures stdout/stderr and artifacts
-
Model-specific runner scripts Tailored to individual models or groups of models
Contains configuration files that define what to run and how to run it.
Defines which models should be executed.
Specifies environment variables to forward into model runners.
Contains prompt definitions, grouped by model type.
Notes:
- Each model (or model family) has its own prompt YAML
- Prompts may optionally reference multimedia inputs (e.g. images)
- If a prompt references images, corresponding files must exist in the
images/directory
- Linux-only support at this stage
- Some models may be temporarily disabled
- Image-based prompts require manual population of the
images/directory
- Broader model coverage
- Improved automation and validation
- Expanded profiling support
- Potential non-Linux support