-
Notifications
You must be signed in to change notification settings - Fork 54
Add initial KV Cache benchmark implementation for MLPerf Storage v3 #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit introduces a comprehensive KV Cache benchmark suite designed to measure storage system performance under AI/ML inference workloads, specifically targeting Large Language Model (LLM) key-value cache operations. Key components added: - Core benchmark scripts (kv-cache.py, kv-cache_sharegpt_replay.py) - Benchmark wrapper and validation tools (kv-cache-wrapper.sh, validate.sh) - Comprehensive proposal documentation for MLPerf Storage v3 integration - README with benchmark overview and usage guidelines The benchmark simulates realistic LLM inference patterns including: - Key-value cache read/write operations - Mixed sequential and random access patterns - Multi-threaded concurrent access scenarios - Conversation-based workload replay using ShareGPT dataset This work addresses the growing need to standardize storage performance measurements for AI inference workloads and provides a foundation for MLPerf Storage v3.0 KV cache benchmark specification.
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
FileSystemGuy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial version being imported.
wvaske
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will turn comments into issues once this is merged.
| total_prob = sum(chunk_probabilities) | ||
| chunk_probabilities = [p / total_prob for p in chunk_probabilities] | ||
|
|
||
| retrieved_indices = np.random.choice( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Add support for different random distributions (random, uniform, zipfian)
|
|
||
| # --- Tiering Logic --- | ||
| # Decide which tier to write to based on available memory. | ||
| with self.memory_lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New KVs should be written to the top layer and trigger eviction from a tier if sufficient space doesn't exist.
This commit introduces a comprehensive KV Cache benchmark suite designed to measure storage system performance under AI/ML inference workloads, specifically targeting Large Language Model (LLM) key-value cache operations.
Key components added:
The benchmark simulates realistic LLM inference patterns including:
This work addresses the growing need to standardize storage performance measurements for AI inference workloads and provides a foundation for MLPerf Storage v3.0 KV cache benchmark specification.