Add vllm:prefix_cache_hits and vllm:prefix_cache_queries counters by InfraWhisperer · Pull Request #358 · llm-d/llm-d-inference-sim

InfraWhisperer · 2026-02-23T15:43:08Z

Summary

Add vllm:prefix_cache_hits and vllm:prefix_cache_queries Prometheus counters matching vLLM v1 token-level semantics
queries increments by total prompt tokens per request, hits by cached tokens — enables rate(hits) / rate(queries) for cache effectiveness measurement
Follows existing channel + async updater goroutine pattern (kvCacheUsageChan → kvCacheUsageUpdater)

Closes #356

Test plan

Verify go build ./... passes
Run simulator with --enable-kvcache and confirm both counters appear on /metrics
Send repeated prompts with shared prefixes, verify prefix_cache_hits increments on subsequent requests
Confirm counters stay at zero when --enable-kvcache is not set
Validate rate(vllm:prefix_cache_hits[5m]) / rate(vllm:prefix_cache_queries[5m]) produces expected hit rate in Prometheus

github-actions · 2026-02-23T15:43:21Z

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

Expose token-level prefix cache metrics matching vLLM v1 semantics. Both counters increment per-request: queries by total prompt tokens, hits by tokens found already cached. Enables computing cache hit rate via rate(hits) / rate(queries) for scorer strategy benchmarking. Closes llm-d#356 Signed-off-by: InfraWhisperer <raghav.potluri21@gmail.com>

InfraWhisperer force-pushed the prefix-cache-metrics branch 2 times, most recently from 06409c6 to c2cb138 Compare February 23, 2026 15:46

InfraWhisperer force-pushed the prefix-cache-metrics branch from c2cb138 to bc803b4 Compare February 23, 2026 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add vllm:prefix_cache_hits and vllm:prefix_cache_queries counters#358

Add vllm:prefix_cache_hits and vllm:prefix_cache_queries counters#358
InfraWhisperer wants to merge 1 commit intollm-d:mainfrom
InfraWhisperer:prefix-cache-metrics

InfraWhisperer commented Feb 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

InfraWhisperer commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

InfraWhisperer commented Feb 23, 2026 •

edited

Loading