Skip to content

Comments

Add vllm:prefix_cache_hits and vllm:prefix_cache_queries counters#358

Open
InfraWhisperer wants to merge 1 commit intollm-d:mainfrom
InfraWhisperer:prefix-cache-metrics
Open

Add vllm:prefix_cache_hits and vllm:prefix_cache_queries counters#358
InfraWhisperer wants to merge 1 commit intollm-d:mainfrom
InfraWhisperer:prefix-cache-metrics

Conversation

@InfraWhisperer
Copy link

@InfraWhisperer InfraWhisperer commented Feb 23, 2026

Summary

  • Add vllm:prefix_cache_hits and vllm:prefix_cache_queries Prometheus counters matching vLLM v1 token-level semantics
  • queries increments by total prompt tokens per request, hits by cached tokens — enables rate(hits) / rate(queries) for cache effectiveness measurement
  • Follows existing channel + async updater goroutine pattern (kvCacheUsageChankvCacheUsageUpdater)

Closes #356

Test plan

  • Verify go build ./... passes
  • Run simulator with --enable-kvcache and confirm both counters appear on /metrics
  • Send repeated prompts with shared prefixes, verify prefix_cache_hits increments on subsequent requests
  • Confirm counters stay at zero when --enable-kvcache is not set
  • Validate rate(vllm:prefix_cache_hits[5m]) / rate(vllm:prefix_cache_queries[5m]) produces expected hit rate in Prometheus

@github-actions
Copy link

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

@InfraWhisperer InfraWhisperer force-pushed the prefix-cache-metrics branch 2 times, most recently from 06409c6 to c2cb138 Compare February 23, 2026 15:46
Expose token-level prefix cache metrics matching vLLM v1 semantics.
Both counters increment per-request: queries by total prompt tokens,
hits by tokens found already cached. Enables computing cache hit rate
via rate(hits) / rate(queries) for scorer strategy benchmarking.

Closes llm-d#356

Signed-off-by: InfraWhisperer <raghav.potluri21@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add vllm:prefix_cache_hits and vllm:prefix_cache_queries Prometheus counters

1 participant