Skip to content

Commit a028028

Browse files
feat(plugins): enforce per-tool timeouts and enhanced circuit breaker (IBM#2569)
Implement strict per-tool timeout enforcement for all transports (REST, SSE, StreamableHTTP, A2A) and enhance the CircuitBreakerPlugin with half-open states, retry headers, and granular configuration. Changes: - Wrap all tool invocations in asyncio.wait_for with effective_timeout - Add per-tool timeout_ms support (ms to seconds conversion) - Add half-open state for circuit breaker recovery testing - Add half_open_in_flight flag to prevent concurrent probe requests - Add retry_after_seconds in violation response for rate limiting - Add tool_timeout_total and circuit_breaker_open_total Prometheus metrics - Add cb_timeout_failure context flag for timeout detection in plugins - Add tool_overrides for per-tool circuit breaker configuration - Handle both asyncio.TimeoutError and httpx.TimeoutException - Log actual elapsed time instead of configured timeout Fixes applied during review: - Fix _is_error() to detect camelCase isError from model_dump(by_alias=True) - Fix half-open probe guard: only check when st.half_open is True - Add stale-probe timeout to prevent permanent wedge if plugin blocks - Add timeout enforcement to A2A tool invocations - Call tool_post_invoke on exceptions so circuit breaker tracks failures - Add ToolTimeoutError subclass to distinguish timeouts from other errors - Only skip post_invoke for ToolTimeoutError (not all ToolInvocationError) - Set error_message and span attributes for ToolTimeoutError observability - Update README to document isError camelCase support Timeout precedence: 1. Per-tool timeout_ms (if set and non-zero) 2. Global TOOL_TIMEOUT setting (default: 60s) Closes IBM#2078 Signed-off-by: Mihai Criveti <[email protected]> Co-authored-by: Mihai Criveti <[email protected]>
1 parent 3ee68db commit a028028

File tree

6 files changed

+1536
-41
lines changed

6 files changed

+1536
-41
lines changed

mcpgateway/services/metrics.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,12 +44,26 @@
4444

4545
# Third-Party
4646
from fastapi import Response, status
47-
from prometheus_client import Gauge, REGISTRY
47+
from prometheus_client import Counter, Gauge, REGISTRY
4848
from prometheus_fastapi_instrumentator import Instrumentator
4949

5050
# First-Party
5151
from mcpgateway.config import settings
5252

53+
# Global Metrics
54+
# Exposed for import by services/plugins to increment counters
55+
tool_timeout_counter = Counter(
56+
"tool_timeout_total",
57+
"Total number of tool invocation timeouts",
58+
["tool_name"],
59+
)
60+
61+
circuit_breaker_open_counter = Counter(
62+
"circuit_breaker_open_total",
63+
"Total number of times circuit breaker opened",
64+
["tool_name"],
65+
)
66+
5367

5468
def setup_metrics(app):
5569
"""

0 commit comments

Comments
 (0)