-
Notifications
You must be signed in to change notification settings - Fork 51
🐛 fix(quota): add centralized quota exhaustion detection to prevent doomed requests #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…oomed requests Add a `_is_quota_exhausted` method to UsageManager that checks if quota for a credential/model combination is fully consumed. This method is now called in all key acquisition paths to ensure exhausted keys are skipped. Changes: - Add `_is_quota_exhausted(key, model)` method for centralized quota check - Update `get_available_credentials_for_model` to skip exhausted keys - Update `acquire_key` priority path to skip exhausted keys - Update `acquire_key` non-priority path to skip exhausted keys - Enhance `update_quota_baseline` to accept optional `quota_reset_ts` - Auto-set model cooldowns when quota is 0% and reset time is known - Propagate cooldowns across all models in a quota group This ensures the proxy accurately respects API quota limits and avoids making requests that would result in StreamedAPIError. Tested: ~8 hours in live environment with no issues.
|
Starting my review of the quota exhaustion detection fix. I'll be examining the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall Assessment
This PR implements a well-structured fix for quota exhaustion detection. The centralized _is_quota_exhausted method is cleanly integrated across all three credential acquisition paths (get_available_credentials_for_model, priority path in acquire_key, and non-priority path in acquire_key).
I initially questioned whether the quota check needed to account for quota groups, but on closer inspection, the design is correct: update_quota_baseline already syncs both request_count and quota_max_requests across all models in a quota group at write time, so the per-model check naturally works for grouped quotas as well.
Architectural Feedback
The approach of adding quota_reset_ts passthrough from the API response and auto-setting cooldowns when quota is depleted is sensible. This creates a dual-layer protection: immediate quota checks prevent selection of exhausted credentials, while cooldowns provide time-based recovery.
Key Suggestions
- Documentation: The
0.001threshold for triggering cooldowns could use a brief comment explaining its purpose. - DRY: The cooldown-setting logic for grouped models is duplicated and could be extracted into a helper.
Both are minor maintainability improvements and non-blocking.
Nitpicks and Minor Points
None beyond the inline comments.
Questions for the Author
- Is the
0.001threshold intentionally chosen as a precision tolerance (accounting for floating-point issues), or does it represent a "practically zero" quota state?
This review was generated by an AI assistant.
src/rotator_library/usage_manager.py
Outdated
| if quota_reset_ts: | ||
| model_data["quota_reset_ts"] = quota_reset_ts | ||
| # Also set as model cooldown if it's in the future and quota is low | ||
| if quota_reset_ts > now_ts and remaining_fraction <= 0.001: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The threshold 0.001 (0.1% remaining) triggers cooldowns when quota is nearly exhausted. Consider adding a brief inline comment explaining this value (e.g., # 0.1% threshold - effectively zero quota) to document the design intent for future maintainers.
src/rotator_library/usage_manager.py
Outdated
| if quota_reset_ts: | ||
| other_model_data["quota_reset_ts"] = quota_reset_ts | ||
| # Also set as model cooldown if it's in the future and quota is low | ||
| if quota_reset_ts > now_ts and remaining_fraction <= 0.001: | ||
| model_cooldowns = key_data.setdefault( | ||
| "model_cooldowns", {} | ||
| ) | ||
| model_cooldowns[grouped_model] = quota_reset_ts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cooldown-setting logic for grouped models duplicates lines 2148-2154. Consider extracting a helper like _set_cooldown_if_exhausted(key_data, model, quota_reset_ts, now_ts, remaining_fraction) to centralize this logic and reduce maintenance burden.
Address PR Mirrowel#51 review suggestions from mirrobot-agent: - Extract duplicate cooldown logic into `_set_quota_reset_and_cooldown()` helper method, eliminating code duplication in quota sync flow - Add documentation explaining the 0.001 (0.1%) threshold meaning: quota is "practically zero" when below this level, not a float precision guard Refs: Mirrowel#51
|
How would it know we used up 100%? Only real way of knowing is making a request and getting an error, or polling the API to get the current quota. Also, same reason the error/api reset timers are used as definitive - they are most accurate. |
We're still using the same polling to the api to get the current quota. The only problem is the usage manager is not checking the key and still stuck at seeing the key as available and this leads to the case where it will use the key eventhough it's already exhausted. So no changes to the polling |
Address PR Mirrowel#51 review suggestions from mirrobot-agent: - Extract duplicate cooldown logic into `_set_quota_reset_and_cooldown()` helper method, eliminating code duplication in quota sync flow - Add documentation explaining the 0.001 (0.1%) threshold meaning: quota is "practically zero" when below this level, not a float precision guard Refs: Mirrowel#51
|
@mirrobot-agent check this PR against latest dev branch and see if this is needed, as the logic should be robust already. |
|
@Mirrowel, understood! I'll analyze this PR against the latest dev branch to determine if these quota exhaustion detection changes are still necessary. I'll compare:
Back shortly with my findings. |
|
@Mirrowel, I've completed my analysis comparing this PR against the latest dev branch. SummaryThis PR addresses a real gap in the current dev branch logic and is still needed. AnalysisWhat the dev branch already has:
What this PR adds that's NOT in dev:
The Gap (Why This PR Is Still Needed)The dev branch relies entirely on cooldowns to prevent using exhausted credentials. However, there's a timing vulnerability: Scenario: A credential's quota becomes exhausted locally (request_count reaches max_requests) but:
In this window, the credential has no cooldown set even though its quota is exhausted. The current dev branch will still select this credential, leading to failed requests. The PR's Note on the
|
|
@mirrobot-agent google's backend is not precise, so we can go over the max quota value. Especially since we use approximate logic here. Additionally, reliance on hitting the error to set the cooldown means that if quotas increase - users can go over the max without their quota artificially being limited to old max. Re-evaluate with new info. |
|
@Mirrowel, thank you for the additional context. I'll re-evaluate this PR with your points about:
Re-analyzing now with this in mind. |
|
@Mirrowel, I've completed my re-evaluation with your additional context. You're right—this PR should NOT be merged as-is. Revised AnalysisThe Problem with
|
| Change | Status | Notes |
|---|---|---|
_is_quota_exhausted filtering |
❌ Should NOT merge | Contradicts error-driven design |
_set_quota_reset_and_cooldown helper |
Code cleanup, but uses 0.001 threshold vs 0.0 in dev | |
quota_reset_ts passthrough |
✅ Already in dev | Dev already accepts reset_timestamp param |
The 0.001 threshold change is a subtle difference—dev uses remaining_fraction <= 0.0 (strict zero) while the PR uses <= 0.001 (0.1% tolerance). The strict zero is more aligned with "trust the API" since it only triggers when the API explicitly says quota is exhausted.
Recommendation
Close this PR without merging. The core feature (_is_quota_exhausted filtering in credential acquisition) conflicts with the error-driven design philosophy. The refactoring portions are minor improvements but introduce a threshold change that loosens the exhaustion criteria.
This analysis was generated by an AI assistant.
|
As the bot said - closing. Already mostly implemented. |
📝 Summary
Fix quota exhaustion detection to ensure the proxy accurately respects API quota limits and avoids making doomed requests that would result in
StreamedAPIError.✨ Changes
_is_quota_exhausted(key, model)method toUsageManagerfor consistent quota exhaustion detection across all code pathsget_available_credentials_for_modeland both priority/non-priority paths inacquire_keyto skip exhausted credentialsupdate_quota_baselineto accept optionalquota_reset_tsfrom API responses and auto-set model cooldowns when quota is depleted📁 Files Changed
src/rotator_library/usage_manager.pysrc/rotator_library/providers/utilities/antigravity_quota_tracker.py🧪 Testing
📋 Result
The proxy will now accurately respect quota limits fetched from the API and avoid making requests that would result in
StreamedAPIError. Previously, credentials with 100% consumed quota could still be selected if no explicit cooldown was set yet.Important
Centralized quota exhaustion detection in
UsageManagerto prevent using exhausted credentials and enhance quota synchronization._is_quota_exhausted(key, model)inUsageManagerfor consistent quota exhaustion detection.get_available_credentials_for_modelandacquire_keyto skip exhausted credentials.update_quota_baselineto acceptquota_reset_tsand auto-set cooldowns.usage_manager.py: Core logic for quota exhaustion detection.antigravity_quota_tracker.py: Pass reset timestamp to quota baseline update.This description was created by
for c361cdb. You can customize this summary. It will automatically update as commits are pushed.