feat(openai-agents): capture tool call arguments + result on tool spans#4063
feat(openai-agents): capture tool call arguments + result on tool spans#4063hansmire wants to merge 1 commit intotraceloop:mainfrom
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
c18ada4 to
14fc3ac
Compare
|
Max Hansmire seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
The `FunctionSpanData` emitted by the OpenAI Agents SDK for every function-tool invocation carries the tool's arguments (`input`) and return value (`output`). This instrumentor wasn't reading either — the resulting tool spans only carried the tool name and type, so downstream trace UIs (Braintrust in particular) had no way to render what actually happened in the tool call. The OpenAI Agents SDK's own native tracing processor (e.g. Braintrust's `BraintrustTracingProcessor`, `_function_log_data` at `braintrust/wrappers/openai.py:158-162`) logs both fields directly; OTel-based integrations were the outlier. This change reads `FunctionSpanData.input` / `.output` in `on_span_end` and emits them as the standard OTel GenAI semconv attributes: * `gen_ai.tool.call.arguments` (via `GEN_AI_TOOL_CALL_ARGUMENTS`) * `gen_ai.tool.call.result` (via `GEN_AI_TOOL_CALL_RESULT`) Serialisation: if either value is already a string it's passed through, otherwise it's `json.dumps`-ed with a `str()` fallback for non-JSON-serialisable payloads. Gated on `should_send_prompts()` so that content-sensitive deployments opting out of prompt capture also get the tool payload suppressed. Includes a regression assertion in the existing VCR-backed `test_agent_with_function_tool_spans` integration test.
14fc3ac to
c982bfd
Compare
Summary
FunctionSpanDatacarries the tool's arguments (input) and return value (output) on every function-tool invocation, but the openai-agents instrumentor never read either — tool spans only carried the tool name and type. Braintrust's own native Agents-SDK processor (braintrust/wrappers/openai.py:_function_log_data) logs both as top-level input/output fields; OTel-based integrations were the outlier.This PR captures both values and emits them under two attribute pairs so they show up both as semantic-convention metadata and as first-class Input/Output panels in trace UIs.
Before / After
Same
get_customer_info.toolspan, same Braintrust workspace, rendered side-by-side:Before: Output panel renders the tool's JSON schema (
{"tool": {"name": ..., "strict_json_schema": true, "type": "function"}}) — a tool definition leaked into the output slot, not the tool's return value.After: Output panel renders the real tool call result (
hasChatEmailed: false, mailingAddress: "123 Main St, …", partyId: "risk_1a", …).Changes (
_hooks.py)FunctionSpanData.input/.outputinon_span_endand emit them as:gen_ai.tool.call.arguments/gen_ai.tool.call.result— OTel GenAI semconv (for OTel-aware backends).traceloop.entity.input/traceloop.entity.output— Traceloop convention used by the LangChain instrumentor, which trace backends (Braintrust etc.) already map to the span's top-level Input / Output panels.gen_ai.completion.tool.*attributes previously set on tool spans at start time. They were a tool definition blob (name / type / strict_json_schema) that some backends' ingestion logic treated as the tool span's Output, producing{"tool": {...}}in the Output panel instead of the actual return value. The equivalent information is already carried bygen_ai.tool.name/gen_ai.tool.type+TRACELOOP_SPAN_KIND._serialize_span_payloadhelper — serialises the FunctionSpanData input/output reliably:model_dump()for pydantic models (e.g. Agents SDKToolOutputText) and for lists of pydantic models,json.dumps(..., default=str)fallback,str()as last resort.trace_content = should_send_prompts()so content-sensitive deployments that opt out of prompt capture also get the tool payload suppressed.Tests
Extended the existing VCR-backed
test_agent_with_function_tool_spansto assert:GEN_AI_TOOL_CALL_ARGUMENTS/GEN_AI_TOOL_CALL_RESULTare set and non-empty.TRACELOOP_ENTITY_INPUT/TRACELOOP_ENTITY_OUTPUTare set and match the GenAI pair.All 10 tests in
tests/test_openai_agents.pypass locally.uv run ruff checkclean.Notes
Part of a small series of
openai-agentsparity fixes (#4061 cached_tokens + reasoning_tokens, #4062 tool span type + duration). Each stands alone offmainand can be merged in any order.