SSD benchmarks based on Mooncake Trace by alogfans · Pull Request #1613 · kvcache-ai/Mooncake

alogfans · 2026-03-05T08:31:41Z

Description

Add new storage benchmark tool (benchmarks/storage_benchmark/storage_benchmark.py) to evaluate KVCache storage I/O performance
Add comprehensive documentation (docs/source/performance/storage-benchmark.md)

Module

Type of Change

How Has This Been Tested?

N/A

Checklist

I have performed a self-review of my own code.
I have formatted my own code using ./scripts/code_format.sh before submitting.
I have updated the documentation.
I have added tests to prove my changes are effective.

gemini-code-assist · 2026-03-05T08:31:57Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new benchmarking tool designed to evaluate the I/O performance of KVCache storage systems. It simulates realistic workloads based on Mooncake's OffsetAllocator and vLLM's PagedAttention architectures, providing detailed metrics on latency, bandwidth, and cache hit rates to help optimize storage performance.

Highlights

New KVCache Storage Benchmark Tool: A new Python script (benchmarks/storage_benchmark/storage_benchmark.py) has been added to evaluate KVCache storage I/O performance. It simulates realistic workloads using an Offset Allocator architecture and vLLM's PagedAttention block mechanism.
Comprehensive Documentation: Detailed documentation (docs/source/performance/storage-benchmark.md) has been included, explaining the benchmark tool's overview, test flow, command-line options, test scenarios, output examples, and performance metrics.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

benchmarks/storage_benchmark/storage_benchmark.py
- Added a new Python script for KVCache storage benchmarking.
- Implemented OffsetAllocatorStorage class for high-performance block storage using a single file and offset management.
- Included StorageBenchmark class to simulate KVCache requests with prefix caching.
- Added TraceLoader to parse request traces from JSONL files.
- Provided functions to run the benchmark with configurable parameters and print detailed results.
docs/source/performance/storage-benchmark.md
- Added new documentation detailing the KVCache storage benchmark tool.
- Described the tool's overview, test flow, command-line options, test scenarios, output format, and performance metrics.

Activity

No activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces a new SSD benchmark tool and its accompanying documentation. The Python script is well-structured, utilizing pathlib, dataclasses, and argparse effectively. It employs low-level file I/O operations (os.pread, os.pwrite, os.fsync) appropriate for a storage benchmark, including sparse file pre-allocation and detailed statistics collection. The documentation is clear and provides good guidance on usage and metrics. Several areas for improvement have been identified, primarily concerning the use of magic numbers, resource management, and clarity in metric calculations.

benchmarks/storage_benchmark/storage_benchmark.py

codecov-commenter · 2026-03-05T09:03:08Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

benchmarks/storage_benchmark/storage_benchmark.py

ykwd · 2026-03-06T06:29:20Z

benchmarks/storage_benchmark/storage_benchmark.py

+# Source: https://lmcache.ai/kv_cache_calculator.html
+MODEL_BYTES_PER_TOKEN = {
+    # Small models (7B-13B)
+    "llama-2-7b": 512,


Seems like most of the numbers in this table are not correct

I have updated values by using default configuration of https://lmcache.ai/kv_cache_calculator.html

Yes. The results from this calculator seem somewhat inaccurate, since it uses the same formula to compute results for all models regardless of their differences. Similarly, there is another calculator on Hugging Face: https://huggingface.co/spaces/gaunernst/kv-cache-calculator. The results from these two calculators also seem inconsistent with each other.

zhangzuo21 · 2026-03-09T03:12:33Z

benchmarks/storage_benchmark/storage_benchmark.py

+        start = time.time()
+
+        # Generate simulated data
+        data = os.urandom(self.block_size_bytes)


os.urandom() is called after start = time.time(), so the CPU time for generating random data is included in the measured write latency. For a 1MB block, os.urandom typically takes 3–8ms on a modern CPU, while an actual NVMe write takes 0.1–0.5ms. This means the write latency numbers are dominated by CPU cost, not disk I/O. Please move os.urandom before start = time.time()

zhangzuo21 · 2026-03-09T03:13:25Z

benchmarks/storage_benchmark/storage_benchmark.py

+            int: File descriptor
+        """
+        if self.fd is None:
+            # Use O_RDWR | O_CREAT, no O_DIRECT (Python compatibility)


The file is opened without O_DIRECT, so all reads go through the OS page cache. After a block is written, subsequent reads of the same block will be served from memory rather than SSD. The sample output in the docs appears to confirm this — read P50/P95/P99 are all identical at 0.280ms, which is inconsistent with real SSD behavior. As a result, the benchmark does not actually measure SSD read performance. Consider using O_DIRECT with aligned buffers, or calling posix_fadvise(POSIX_FADV_DONTNEED) after each write to evict the page from cache. At minimum, the docs should note that results are affected by the page cache.

We have added posix_fadvise.

ShangmingCai · 2026-03-09T04:36:42Z

docs/source/performance/storage-benchmark.md

+
+## Requirements
+
+- Python 3.8+


I think we only support Python 3.10+ now, since PyTorch also does.

stmatengss · 2026-03-10T09:23:27Z

Thoughts:
It can be added as a module in wheel. So we can run it as python3 -m mooncake.benchmark.storage_bench [args]

alogfans · 2026-03-11T07:25:46Z

Thoughts: It can be added as a module in wheel. So we can run it as python3 -m mooncake.benchmark.storage_bench [args]

It can be used as standalone. So I think remaining this may be better.

stmatengss

Code Review Summary

Verdict: Approve with suggestions

This PR adds a valuable storage benchmark tool for evaluating KVCache I/O performance using Mooncake traces. The implementation is well-structured and documented.

Strengths ✅

Architecture

Clean offset allocator design mimicking Mooncake's approach
Single-file storage avoids file explosion
Uses pread/pwrite for thread-safe operations
Comprehensive statistics tracking

Documentation

Excellent inline comments explaining design decisions
Clear examples in docstrings
Well-structured code with logical sections

Functionality

Supports multiple model configurations (LLaMA, Qwen, DeepSeek, etc.)
Timestamp replay for realistic workload simulation
Detailed latency percentile tracking (p50/p95/p99)

Issues & Suggestions

1. Performance: fsync on every write is expensive
Line 260: os.fsync(fd) after every block write will severely impact throughput. Consider:

Batch fsync every N writes
Make fsync optional via flag
Use O_SYNC flag instead for better performance

2. Memory: Unbounded latency list growth
Lines 140-141, 231: Latency lists grow unbounded. For large traces (millions of requests), this will consume GBs of memory. Consider:

Streaming percentile calculation (t-digest or reservoir sampling)
Periodic aggregation and reset
Max list size with sampling

3. Security: os.urandom is slow
Line 251: os.urandom() is cryptographically secure but slow. For benchmarking, use:

data = bytes(self.block_size_bytes)  # Zero-filled
# or
data = bytearray(self.block_size_bytes)

4. Bug: File descriptor leak risk
No try/finally in _get_fd(). If exception occurs, fd may leak. Add proper cleanup.

5. Missing: Error handling

No validation for trace file format
No handling of disk full scenarios
No cleanup on benchmark failure

6. Configuration: Hard-coded constants

BLOCK_SIZE_TOKENS=512 should be configurable
MIN_LATENCY_MS seems arbitrary

7. Testing: No unit tests
Checklist shows tests not added. Consider adding:

Unit tests for OffsetAllocatorStorage
Mock trace processing tests
Edge case validation

See inline comments for specific code suggestions.

Add SSD benchmarks based on Mooncake Trace

cd4fd00

alogfans requested review from ShangmingCai, stmatengss and ykwd as code owners March 5, 2026 08:31

github-actions bot added the run-ci label Mar 5, 2026

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

stmatengss reviewed Mar 5, 2026

View reviewed changes

benchmarks/storage_benchmark/storage_benchmark.py Outdated Show resolved Hide resolved

alogfans marked this pull request as draft March 6, 2026 01:31

alogfans changed the title ~~Add SSD benchmarks based on Mooncake Trace~~ [WIP] Add SSD benchmarks based on Mooncake Trace Mar 6, 2026

alogfans added 2 commits March 6, 2026 02:48

Update test scripts

7492275

Update doc

583fd1f

alogfans marked this pull request as ready for review March 6, 2026 02:56

alogfans changed the title ~~[WIP] Add SSD benchmarks based on Mooncake Trace~~ SSD benchmarks based on Mooncake Trace Mar 6, 2026

ykwd reviewed Mar 6, 2026

View reviewed changes

zhangzuo21 reviewed Mar 9, 2026

View reviewed changes

ShangmingCai reviewed Mar 9, 2026

View reviewed changes

Fix problems

5b2e8c9

stmatengss reviewed Mar 12, 2026

View reviewed changes

alogfans and others added 2 commits March 16, 2026 09:59

Merge branch 'main' into add-storage-benchmark

95e14a4

Fix issues

1d3026a

ShangmingCai approved these changes Mar 16, 2026

View reviewed changes

Conversation

alogfans commented Mar 5, 2026

Description

Module

Type of Change

How Has This Been Tested?

Checklist

Uh oh!

gemini-code-assist bot commented Mar 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Mar 5, 2026

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stmatengss commented Mar 10, 2026

Uh oh!

alogfans commented Mar 11, 2026

Uh oh!

stmatengss left a comment

Choose a reason for hiding this comment

Code Review Summary

Strengths ✅

Issues & Suggestions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants