vLLM CPU Performance Evaluation

Comprehensive performance evaluation framework for vLLM on CPU platforms.

This repository provides a complete testing methodology, automation tools, and platform configurations for evaluating vLLM inference performance on CPU-based systems.

Quick Start

See the Quick Start Guide

Repository Structure

vllm-cpu-perf-eval/
├── README.md                           # This file
│
├── models/                             # Centralized model definitions
│   ├── models.md                       # Comprehensive model documentation
│   ├── llm-models/                     # LLM model configurations
│   │   ├── model-matrix.yaml          # LLM model test mappings
│   │   └── llm-models.md              # Redirects to models.md
│   └── embedding-models/               # Embedding model configurations
│       └── model-matrix.yaml          # Embedding model test mappings
│
├── tests/                              # Test suites and scenarios
│   ├── tests.md                        # Test suite overview
│   ├── concurrent-load/                # Test Suite 1: Concurrent load testing
│   │   ├── concurrent-load.md         # Suite documentation
│   │   └── *.yaml                     # Test scenario definitions
│   ├── scalability/                    # Test Suite 2: Scalability testing
│   │   ├── scalability.md             # Suite documentation
│   │   └── *.yaml                     # Test scenario definitions
│   ├── resource-contention/            # Test Suite 3: Resource contention
│   │   ├── resource-contention.md     # Suite documentation
│   │   └── *.yaml                     # Test scenario definitions (planned)
│   └── embedding-models/               # Embedding model test scenarios
│       ├── embedding-models.md        # Embedding test documentation
│       ├── baseline-sweep.yaml        # Baseline performance tests
│       └── latency-concurrent.yaml    # Latency tests
│
├── automation/                         # Automation framework
│   ├── test-execution/                 # Test orchestration
│   │   ├── ansible/                   # Ansible playbooks (primary)
│   │   │   ├── ansible.md             # Ansible documentation
│   │   │   ├── inventory/             # Host configurations
│   │   │   ├── filter_plugins/        # Custom Ansible filters
│   │   │   ├── roles/                 # Ansible roles
│   │   │   ├── tests/                 # Ansible tests
│   │   │   └── *.yml                  # Playbook files
│   │   └── bash/                      # Bash automation scripts
│   │       └── embedding/             # Embedding test scripts
│   ├── platform-setup/                 # Platform configuration
│   │   └── bash/                      # Platform setup scripts
│   │       └── intel/                 # Intel-specific setup
│   └── utilities/                      # Helper utilities
│       ├── health-checks/             # Health check scripts
│       └── log-monitoring/            # Log analysis tools
│
├── docs/                               # Documentation
│   ├── docs.md                         # Documentation index
│   ├── methodology/                    # Test methodology
│   │   └── overview.md                # Testing approach and metrics
│   └── platform-setup/                 # Platform setup guides
│
├── results/                            # Test results (gitignored)
│   ├── llm/                           # LLM test results
│   └── results.md                     # Results documentation
│
├── utils/                              # Utility scripts and tools
│
└── Configuration Files
    ├── .pre-commit-config.yaml        # Pre-commit hooks configuration
    ├── .yamllint.yaml                 # YAML linting rules
    ├── .markdownlint-cli2.yaml        # Markdown linting rules
    └── .gitignore                     # Git ignore patterns

Key Directories:

models/ - Model definitions reused across all test suites
tests/ - Test suite definitions organized by testing focus
automation/test-execution/ansible/ - Ansible playbooks for test execution
docs/ - Comprehensive testing methodology and guides
results/ - Local test results (gitignored, see results.md)

See individual directory markdown files for detailed information.

Key Features

Flexible Container Runtime Support

Docker or Podman - Use either runtime
Auto-detection - Automatically detects available runtime
Rootless support - Full Podman rootless compatibility

Centralized Model Management

Define models once, use across all test phases
Easy to add new models
Model matrix for flexible test configuration

Multi-Platform Support

Intel Xeon (Ice Lake, Sapphire Rapids)
AMD EPYC
ARM64 (planned)

Comprehensive Automation

Ansible playbooks for platform setup and test execution
Bash scripts for manual operation
Docker/Podman Compose for containerized testing
Distributed testing across multiple nodes

Multiple Test Suites

Concurrent Load: Concurrent load testing
Scalability: Scalability and sweep testing
Resource Contention: Resource contention testing (planned)

Enhanced Concurrent Load Testing

⏱️ Time-based testing - Consistent 10-minute tests across CPU types
1️⃣ Single-user baseline - Concurrency=1 for efficiency calculations
📊 Variable workloads - Realistic traffic simulation with statistical variance
🔄 Prefix caching control - Baseline vs production comparison
🎯 3-phase testing - Baseline → Realistic → Production methodology
🚀 Large model support - Added gpt-oss-20b (21B MoE) for scalability testing

See 3-Phase Testing Strategy for details.

Documentation

Ansible Testing - Complete Ansible usage guide
Methodology - Testing methodology and metrics
Platform Setup - Intel platform configuration
Models - Model definitions and selection
Tests - Test suite documentation

Full documentation index: docs/docs.md

Test Suites

⚠️ IMPORTANT: Validation Status and Availability

✅ SUPPORTED (Fully Validated):

Concurrent Load Testing (Phase 1 & Phase 2) - Ready for use

Playbooks: llm-benchmark-concurrent-load.yml, llm-benchmark-auto.yml

Documentation: tests/concurrent-load/concurrent-load.md

🚧 NOT YET SUPPORTED (Blocked from End User Execution):

The following test suites are work in progress and are automatically blocked to prevent end users from running them until they are fully validated:

Scalability - Work in progress; blocked by default

Playbook: llm-core-sweep-auto.yml (will fail with error message)

Contains: sweep, synchronous, poisson tests

Embedding Models - Work in progress; blocked by default

Playbook: embedding-benchmark.yml (will fail with error message)

Scripts: run-baseline.sh, run-latency.sh, run-all.sh (will exit with error)

Resource Contention - Planned; not yet implemented

No test files exist yet

Bypass for Development/Testing Only:

If you need to run unsupported tests for development or testing purposes:

Ansible: Add -e "allow_unsupported_tests=true" to your playbook command

Bash: Export ALLOW_UNSUPPORTED_TESTS=true before running scripts

Note: Unsupported tests are provided as-is with no guarantees they will work without modification. Only use them if you understand the risks and are willing to troubleshoot issues independently.

Test Suite: Concurrent Load

Tests model performance under various concurrent request loads.

Concurrency levels: 1, 2, 4, 8, 16, 32
8 LLM generative models (embedding models not yet supported)
Focus: P95 latency, TTFT, throughput scaling

Test Suite: Scalability

🚧 NOT YET SUPPORTED - This test suite is blocked by default. See validation status above for details.

Characterizes maximum throughput and performance curves.

Sweep tests for capacity discovery
Synchronous baseline tests
Poisson distribution tests
Focus: Maximum capacity, saturation points

Test Suite: Resource Contention

📋 PLANNED - Not yet implemented.

Multi-tenant and resource sharing scenarios.

Models

Current model coverage:

LLM Models (8 total):

Llama-3.2 (1B, 3B) - Prefill-heavy
TinyLlama-1.1B - Balanced small-scale
OPT (125M, 1.3B) - Decode-heavy legacy baseline
Granite-3.2-2B - Balanced enterprise
Qwen3-0.6B, Qwen2.5-3B - High-efficiency balanced

Embedding Models:

🚧 NOT YET SUPPORTED - Embedding model tests are blocked by default. These models are defined but testing is not yet validated.

granite-embedding-english-r2
granite-embedding-278m-multilingual

See models/models.md for complete model definitions, selection rationale, and how to add new models.

Requirements

System Requirements

CPU: Intel Xeon (Ice Lake or newer) or AMD EPYC
Memory: 64GB+ RAM recommended
OS: Ubuntu 22.04+, RHEL 9+, or Fedora 38+
Storage: 500GB+ for models and results

Software Requirements

Python 3.10+
Docker 24.0+ or Podman 4.0+
Ansible 2.14+ (for automation)
GuideLLM v0.5.0+
vLLM

See Ansible Documentation for setup and configuration instructions.

Container Runtime Support

This repository supports both Docker and Podman:

Docker: Traditional container runtime
Podman: Daemonless, rootless-capable alternative
Auto-detection: Automatically uses available runtime

The Ansible playbooks automatically detect and use the available container runtime. For manual configuration, see the vllm_server role documentation.

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Run pre-commit checks: pre-commit run --all-files
Submit a pull request

Pre-commit Hooks

This repository uses pre-commit to ensure code quality.

# Install pre-commit
pip install pre-commit

# Install hooks
pre-commit install
pre-commit install --hook-type commit-msg

# Run manually
pre-commit run --all-files

License

[Add license information]

Support

Documentation: See docs/
Issues: GitHub Issues
Discussions: GitHub Discussions

Acknowledgments

vLLM - High-performance LLM inference engine
GuideLLM - LLM benchmarking tool
Intel and AMD for CPU optimization guidance

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
.github-pages		.github-pages
.github		.github
_includes		_includes
_layouts		_layouts
assets/css		assets/css
automation		automation
collections		collections
docs		docs
hack		hack
models		models
results		results
tests		tests
.gitignore		.gitignore
.markdownlint-cli2.yaml		.markdownlint-cli2.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.yamllint.yaml		.yamllint.yaml
Gemfile		Gemfile
README.md		README.md
_config.yml		_config.yml
codespell.precommit-toml		codespell.precommit-toml
index.md		index.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM CPU Performance Evaluation

Quick Start

Repository Structure

Key Features

Flexible Container Runtime Support

Centralized Model Management

Multi-Platform Support

Comprehensive Automation

Multiple Test Suites

Enhanced Concurrent Load Testing

Documentation

Test Suites

Test Suite: Concurrent Load

Test Suite: Scalability

Test Suite: Resource Contention

Models

Requirements

System Requirements

Software Requirements

Container Runtime Support

Contributing

Pre-commit Hooks

License

Support

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vLLM CPU Performance Evaluation

Quick Start

Repository Structure

Key Features

Flexible Container Runtime Support

Centralized Model Management

Multi-Platform Support

Comprehensive Automation

Multiple Test Suites

Enhanced Concurrent Load Testing

Documentation

Test Suites

Test Suite: Concurrent Load

Test Suite: Scalability

Test Suite: Resource Contention

Models

Requirements

System Requirements

Software Requirements

Container Runtime Support

Contributing

Pre-commit Hooks

License

Support

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages