Skip to content

Commit 0711ec3

Browse files
idiotsjzouyonghe
andauthored
Fix/fix: resolve MCP tools race condition causing 'completion 无法解析' error (#5534)
* fix: resolve MCP tools race condition causing 'completion 无法解析' error - Wait for MCP client initialization to complete before accepting requests - Add Future-based synchronization in init_mcp_clients() - Prevent tool_calls from being rejected due to empty func_list - Improve error logging for MCP initialization failures Fixes race condition where AI attempts to call MCP tools before they are registered, resulting in 'API 返回的 completion 无法解析' exceptions. The issue occurred because: 1. MCP clients were initialized asynchronously without waiting 2. System accepted user requests immediately after startup 3. AI received empty tool list and attempted to call non-existent tools 4. Tool matching failed, causing parsing errors This fix ensures all MCP tools are loaded before the system processes any requests that might use them. * perf: add timeout and better error handling for MCP initialization - Add 20-second total timeout to prevent slow MCP servers from blocking startup - Show detailed configuration info when MCP initialization fails - List all failed services in a summary warning - Gracefully handle timeout by using already-completed services This ensures that even if some MCP servers are slow or unreachable, the system will start within a reasonable time and provide clear feedback about which services failed and why. * refactor: simplify MCP init orchestration and improve log security - Replace Future-based sync with asyncio.wait + name→task mapping - Explicitly cancel timed-out tasks after 20s timeout - Downgrade sensitive config details (command/args/URL) to debug level - Move urllib.parse import to top-level * fix: prevent initialized MCP clients from being cleaned up on timeout - Do not cancel pending tasks on timeout; let them continue running in the background waiting for the termination signal (event.set()), so successfully initialized services remain available - Track initialization state with a flag to distinguish init failures from post-init cancellations in _init_mcp_client_task_wrapper * fix: restore task cancellation on timeout per review feedback Pending tasks in asyncio.wait are tasks that have NOT completed initialization within 20s, so cancelling them is safe and correct. * fix: separate init signal from client lifetime in MCP task wrapper The previous design awaited task completion, but tasks only finish on shutdown (after event.wait()), causing asyncio.wait to always hit the 20s timeout and cancel all clients. Fix: introduce a dedicated ready_event that is set immediately after _init_mcp_client completes. init_mcp_clients now waits only for ready_event (with 20s timeout), while the long-lived client task continues running in the background until shutdown_event is set. This ensures startup returns promptly once clients are ready. * security: redact sensitive MCP config from debug logs Only log executable name and argument count instead of full command/args to avoid leaking tokens or credentials even at debug level. * refactor: use McpClientInfo dataclass and MCP_INIT_TIMEOUT constant - Extract MCP_INIT_TIMEOUT = 20.0 as a named module-level constant - Replace tuple-based client_info with _McpClientInfo dataclass to eliminate index-based access and improve readability - Remove _wait_ready helper; use asyncio.create_task(event.wait()) directly - Await cancelled tasks after timeout to prevent lingering background tasks and unobserved exceptions * fix: handle CancelledError and clean up wait_tasks on timeout - Catch asyncio.CancelledError separately in _init_mcp_client_task_wrapper so ready_event.set() is always called (Python 3.8+ CancelledError inherits BaseException, not Exception) - Cancel and await lingering wait_tasks after timeout to prevent them from hanging indefinitely when ready_event is never set * fix: align enable_mcp_server with new wrapper API and fix security/config issues - Fix enable_mcp_server to pass shutdown_event + ready_event instead of ready_future, matching _init_mcp_client_task_wrapper's current signature - Cancel and await init_task on timeout; clean up mcp_client_event on failure - Read MCP_INIT_TIMEOUT from env var ASTRBOT_MCP_INIT_TIMEOUT (default 20s) so operators can tune it without code changes - Strip userinfo from URL in debug log (use hostname+port only, not netloc) to avoid leaking credentials embedded in URLs * refactor: register mcp_client_event only after successful init in enable_mcp_server Move self.mcp_client_event[name] assignment to after initialization succeeds, so callers never observe a stale event for a failed client. * fix: harden MCP init state handling and timeout parsing * fix: improve MCP timeout and post-init error observability * refactor: simplify MCP init lifecycle orchestration * refactor: simplify MCP init flow and cap timeout values * fix: refine mcp timeout handling and lifecycle task tracking * fix: harden mcp shutdown and timeout source logging * refactor: simplify mcp runtime registry and timeout flow * fix: keep mcp init summary return contract * refactor: streamline mcp lifecycle and init errors * refactor: unify mcp lifecycle wait handling * refactor: simplify mcp runtime ownership and timeout resolution * fix: harden mcp shutdown waiting and startup signaling * refactor: streamline mcp lifecycle and shutdown errors * refactor: harden mcp runtime access and shutdown * fix: ensure mcp client cleanup and clarify views * refactor: cache mcp client view and guard startup * refactor: simplify mcp init cleanup and runtime lock * refactor: reduce mcp runtime duplication * refactor: reuse mcp cleanup and client view --------- Co-authored-by: idiotsj <[email protected]> Co-authored-by: 邹永赫 <[email protected]>
1 parent 0dbe32e commit 0711ec3

3 files changed

Lines changed: 470 additions & 123 deletions

File tree

0 commit comments

Comments
 (0)