-
Notifications
You must be signed in to change notification settings - Fork 295
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Use Case
An agent that uses a memory. Both explicit and Implicit caching requires prompt to be at the beginning of the request.
Problem Statement
- LLM use is expensive. I want to benefit from Explicit and Implicit caching available with various providers supporting this feature.
- Explicit caching is a two-edged sword: it can provide the benefit of significant savings, but it is not free. Explicit caching can save money, but it may end up consuming more money than it saves.
How This Feature Would Help
Reduces LLM costs
Proposed Solution
Move repetitive instruction text to the top of the prompt over the API requests.
Keep track of the statistics for repetitive text to evaluate whether the text was eligible for caching and allow the user to evaluate whether it was actually cached.
Explicit Caching
Providers such as Google and Anthropic support explicit caching.
When used with Anthropic SDK, Gemini SDK, and LiteLLM SDK (with Anthropic/Gemini provider or OpenRouter Provider with the Anthropic/Gemini models), I'd like to be able to enable explicit caching.
TTL
Via CLI/UI/Envs, I'd like to be able to:
- Manipulate the cache TTL
- Monitor cache spending vs. savings with a given TTL in production, so I could try adjusting TTL and later see how it affects my savings with the previous TTL.
Alternatives Considered
No response
Priority
Nice to have
Additional Context
The goal is to
- Ensure Prompt is cachable and can benefit the most from cache
- Provide reasonable defaults for the Explicit cache
- Allow the user to find the best TTL settings to optimize caching benefit
Checklist
- I would be willing to contribute this feature
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request