-
Notifications
You must be signed in to change notification settings - Fork 489
Enforce per-tool timeouts and enhanced circuit breaker plugin #2569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kevalmahajan
wants to merge
1
commit into
main
Choose a base branch
from
2078_circuit_breakers_tool_invocation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,531
β41
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
f09e918 to
2285ea2
Compare
Member
Code reviewFound 1 issue:
The counter at line 2882 uses tool_timeout_counter.labels(tool_name=name).inc()But the log at line 2870 uses message=f"REST tool invocation timed out: {tool_name_computed}",This same inconsistency appears in the SSE and StreamableHTTP timeout handlers (lines 3147 and 3291 use |
2285ea2 to
3ef8dbb
Compare
Open
7 tasks
Implement strict per-tool timeout enforcement for all transports (REST, SSE, StreamableHTTP, A2A) and enhance the CircuitBreakerPlugin with half-open states, retry headers, and granular configuration. Changes: - Wrap all tool invocations in asyncio.wait_for with effective_timeout - Add per-tool timeout_ms support (ms to seconds conversion) - Add half-open state for circuit breaker recovery testing - Add half_open_in_flight flag to prevent concurrent probe requests - Add retry_after_seconds in violation response for rate limiting - Add tool_timeout_total and circuit_breaker_open_total Prometheus metrics - Add cb_timeout_failure context flag for timeout detection in plugins - Add tool_overrides for per-tool circuit breaker configuration - Handle both asyncio.TimeoutError and httpx.TimeoutException - Log actual elapsed time instead of configured timeout Fixes applied during review: - Fix _is_error() to detect camelCase isError from model_dump(by_alias=True) - Fix half-open probe guard: only check when st.half_open is True - Add stale-probe timeout to prevent permanent wedge if plugin blocks - Add timeout enforcement to A2A tool invocations - Call tool_post_invoke on exceptions so circuit breaker tracks failures - Add ToolTimeoutError subclass to distinguish timeouts from other errors - Only skip post_invoke for ToolTimeoutError (not all ToolInvocationError) - Update README to document isError camelCase support Timeout precedence: 1. Per-tool timeout_ms (if set and non-zero) 2. Global TOOL_TIMEOUT setting (default: 60s) Closes #2078 Co-authored-by: Keval Mahajan <[email protected]> Signed-off-by: Mihai Criveti <[email protected]>
3ef8dbb to
5efaabb
Compare
Member
Code Review Changes AppliedRebased onto High Severity Fixes
Medium Severity Fixes
Low Severity Fixes
New Tests Added
Summary of ChangesAll 48 tests pass. Ready for merge. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
β¨ Feature / Enhancement PR
π Epic / Issue
Closes #2078
π Summary (1-2 sentences)
This PR implements strict per-tool timeout enforcement for all transports (REST, SSE, StreamableHTTP) and significantly enhances the
CircuitBreakerPluginwith half-open states, retry headers, and granular configuration overrides.All failure logic is decoupled via the Plugin System:
ToolService: Executes requests, enforces strict limits (Timeouts), reports status.
CircuitBreakerPlugin: Decides if a request should be attempted based on history.
Configurations for timeout:
# Env Variable: TOOL_TIMEOUTTool Schema:Example Precedence Calculation:
π§ͺ Checks
make lintpassesmake testpassesπ Notes (optional)
Key Changes
asyncio.wait_for.timeout_msoverrides global settings.plugins/circuit_breaker.py):retry_after_secondsin violation response.cb_timeout_failurecontext flag.tool_timeout_totalandcircuit_breaker_open_totalPrometheus counters.flowchart TD Client -->|Invoke Tool| Gateway(ToolService) Gateway -->|1. check| CB[CircuitBreakerPlugin] CB -- Open? --> Reject[429 / Blocked] CB -- Closed? --> Gateway Gateway -->|2. Invoke w/ Timeout| Tool[External Tool] Tool -- Success --> Gateway Gateway -->|3. Record Success| CB Tool -- Timeout/Error --> Gateway Gateway -- Flag: cb_timeout_failure --> CB Gateway -->|4. Record Failure| CB CB -- Trip Threshold? --> CBState[Open Circuit]Feature Map & Implementation Details:
asyncio.wait_for. Prioritizes tool-specifictimeout_msover global default.TimeoutErrorand sets context flag. Plugin reads flag and counts as failure even without HTTP status.open_until - nowand returns exact seconds in violation details.tool_overridesmap allows custom thresholds for critical tools (e.g., higher patience for Payment APIs).tool_timeout_totalandcircuit_breaker_open_totalPrometheus counters.Verification
Unit Tests: Added TestToolTimeoutsAndRetries covering state transitions, timeout precedence, and plugin logic.