Skip to content

Conversation

@hazemawadalla
Copy link

This commit introduces a comprehensive KV Cache benchmark suite designed to measure storage system performance under AI/ML inference workloads, specifically targeting Large Language Model (LLM) key-value cache operations.

Key components added:

  • Core benchmark scripts (kv-cache.py, kv-cache_sharegpt_replay.py)
  • Benchmark wrapper and validation tools (kv-cache-wrapper.sh, validate.sh)
  • Comprehensive proposal documentation for MLPerf Storage v3 integration
  • README with benchmark overview and usage guidelines

The benchmark simulates realistic LLM inference patterns including:

  • Key-value cache read/write operations
  • Mixed sequential and random access patterns
  • Multi-threaded concurrent access scenarios
  • Conversation-based workload replay using ShareGPT dataset

This work addresses the growing need to standardize storage performance measurements for AI inference workloads and provides a foundation for MLPerf Storage v3.0 KV cache benchmark specification.

This commit introduces a comprehensive KV Cache benchmark suite designed to
measure storage system performance under AI/ML inference workloads, specifically
targeting Large Language Model (LLM) key-value cache operations.

Key components added:
- Core benchmark scripts (kv-cache.py, kv-cache_sharegpt_replay.py)
- Benchmark wrapper and validation tools (kv-cache-wrapper.sh, validate.sh)
- Comprehensive proposal documentation for MLPerf Storage v3 integration
- README with benchmark overview and usage guidelines

The benchmark simulates realistic LLM inference patterns including:
- Key-value cache read/write operations
- Mixed sequential and random access patterns
- Multi-threaded concurrent access scenarios
- Conversation-based workload replay using ShareGPT dataset

This work addresses the growing need to standardize storage performance
measurements for AI inference workloads and provides a foundation for
MLPerf Storage v3.0 KV cache benchmark specification.
@hazemawadalla hazemawadalla requested a review from a team November 21, 2025 19:50
@hazemawadalla hazemawadalla requested a review from a team as a code owner November 21, 2025 19:50
@github-actions
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@hazemawadalla hazemawadalla changed the base branch from main to TF_KVCache November 21, 2025 22:52
Copy link
Contributor

@FileSystemGuy FileSystemGuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial version being imported.

Copy link
Contributor

@wvaske wvaske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will turn comments into issues once this is merged.

total_prob = sum(chunk_probabilities)
chunk_probabilities = [p / total_prob for p in chunk_probabilities]

retrieved_indices = np.random.choice(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add support for different random distributions (random, uniform, zipfian)


# --- Tiering Logic ---
# Decide which tier to write to based on available memory.
with self.memory_lock:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New KVs should be written to the top layer and trigger eviction from a tier if sufficient space doesn't exist.

@FileSystemGuy FileSystemGuy merged commit 39246aa into mlcommons:TF_KVCache Nov 25, 2025
1 check passed
@github-actions github-actions bot locked and limited conversation to collaborators Nov 25, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants