Fix OpenTelemetry tool spans missing parent ID in CodeAgent#2187
Fix OpenTelemetry tool spans missing parent ID in CodeAgent#2187Ricardo-M-L wants to merge 1 commit intohuggingface:mainfrom
Conversation
The `timeout` decorator in `local_python_executor.py` uses `ThreadPoolExecutor` to run code in a worker thread. Since OpenTelemetry span context is stored in `contextvars`, it was lost when crossing the thread boundary, causing tool spans created during CodeAgent execution to become root spans with no parent. Fix by capturing the current context via `contextvars.copy_context()` and using `ctx.run()` to execute the function in the worker thread, preserving the span hierarchy. This is the same pattern already used in `ToolCallingAgent.process_tool_calls` for parallel tool execution. Fixes huggingface#1961 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VANDRANKI
left a comment
There was a problem hiding this comment.
The root cause analysis is correct. ThreadPoolExecutor in Python does not copy the caller's contextvars context into worker threads, so copy_context() + ctx.run() is exactly the right fix.
A few observations:
Consistency with existing code - this is the same pattern already used in ToolCallingAgent.process_tool_calls for parallel tool execution. Good to have the two code paths aligned.
Test design - the two new tests cover the right things: (1) a unit test verifying contextvar propagation through the raw decorator, and (2) an integration-style test through evaluate_python_code. Both are clear about what they are asserting and why (the docstrings explain the OTel connection).
Return value - ctx.run(func, *args, **kwargs) returns the return value of func(*args, **kwargs), so existing callers are unaffected.
One question: is there an async version of the timeout helper? If there is and it also uses ThreadPoolExecutor, the same fix would apply there.
Overall this is a clean, well-motivated fix with good test coverage. LGTM from my side.
VANDRANKI
left a comment
There was a problem hiding this comment.
Clean, targeted fix. contextvars.copy_context() captures a snapshot of the calling thread's context, and ctx.run(func, ...) executes the worker in that snapshot. Without this, ThreadPoolExecutor starts each worker in an empty context, so OpenTelemetry's active span ContextVar (and any other per-request contextvars) is absent in the worker - tool spans end up with no parent.
Tests are the best part. test_timeout_propagates_contextvars directly verifies the propagation at the @timeout decorator level. test_timeout_propagates_contextvars_to_evaluate_python_code goes end-to-end through evaluate_python_code, which is the actual call site in CodeAgent. Both tests use a test ContextVar rather than OTel internals, which is the right approach - it keeps the tests dependency-free while still verifying the mechanism.
Interaction with PR #2199: PR #2199 also modifies the timeout function in local_python_executor.py (replacing the with ThreadPoolExecutor block with manual executor.shutdown(wait=not timed_out)). These two PRs touch the same lines and will conflict. The context propagation fix in this PR should be carried forward into whichever version of the timeout function lands. Worth coordinating with the author of #2199.
LGTM on the contextvars fix itself.
Summary
CodeAgent, causing them to appear as root spans in OpenTelemetry tracestimeoutdecorator usesThreadPoolExecutorwhich runs code in a new thread wherecontextvars(used by OpenTelemetry for span context) are not propagatedcontextvars.copy_context()+ctx.run()to propagate context to the worker thread — the same pattern already used inToolCallingAgent.process_tool_callsfor parallel tool executionDetails
When
CodeAgentexecutes tools, it goes throughevaluate_python_code()which is wrapped by thetimeoutdecorator. This decorator submits the function to aThreadPoolExecutor, creating a new thread. Since OpenTelemetry stores the active span incontextvars, the span context is lost in the new thread, and any tool spans created during execution have no parent.The fix captures the current context before submitting to the thread pool and uses
ctx.run()to execute within that context, preserving the fullcontextvarsstate including OTel spans.Before (broken):
After (fixed):
Fixes #1961
Test plan
test_timeout_propagates_contextvars— verifies thetimeoutdecorator propagatescontextvarsto the worker threadtest_timeout_propagates_contextvars_to_evaluate_python_code— verifies context propagation throughevaluate_python_codewith timeouttest_timeout_propagates_contextvars_with_local_executor— end-to-end test withLocalPythonExecutorsimulating real CodeAgent tool executionTestTimeouttests pass (13/13)🤖 Generated with Claude Code