Skip to content

feat(openai-agents): capture tool call arguments + result on tool spans#4063

Draft
hansmire wants to merge 1 commit intotraceloop:mainfrom
hansmire:fix/openai-agents-tool-input-output
Draft

feat(openai-agents): capture tool call arguments + result on tool spans#4063
hansmire wants to merge 1 commit intotraceloop:mainfrom
hansmire:fix/openai-agents-tool-input-output

Conversation

@hansmire
Copy link
Copy Markdown

@hansmire hansmire commented Apr 29, 2026

Summary

FunctionSpanData carries the tool's arguments (input) and return value (output) on every function-tool invocation, but the openai-agents instrumentor never read either — tool spans only carried the tool name and type. Braintrust's own native Agents-SDK processor (braintrust/wrappers/openai.py:_function_log_data) logs both as top-level input/output fields; OTel-based integrations were the outlier.

This PR captures both values and emits them under two attribute pairs so they show up both as semantic-convention metadata and as first-class Input/Output panels in trace UIs.

Before / After

Same get_customer_info.tool span, same Braintrust workspace, rendered side-by-side:

PR #4063 before/after

Before: Output panel renders the tool's JSON schema ({"tool": {"name": ..., "strict_json_schema": true, "type": "function"}}) — a tool definition leaked into the output slot, not the tool's return value.
After: Output panel renders the real tool call result (hasChatEmailed: false, mailingAddress: "123 Main St, …", partyId: "risk_1a", …).

Changes (_hooks.py)

  1. Capture FunctionSpanData.input / .output in on_span_end and emit them as:
    • gen_ai.tool.call.arguments / gen_ai.tool.call.result — OTel GenAI semconv (for OTel-aware backends).
    • traceloop.entity.input / traceloop.entity.output — Traceloop convention used by the LangChain instrumentor, which trace backends (Braintrust etc.) already map to the span's top-level Input / Output panels.
  2. Drop the three gen_ai.completion.tool.* attributes previously set on tool spans at start time. They were a tool definition blob (name / type / strict_json_schema) that some backends' ingestion logic treated as the tool span's Output, producing {"tool": {...}} in the Output panel instead of the actual return value. The equivalent information is already carried by gen_ai.tool.name / gen_ai.tool.type + TRACELOOP_SPAN_KIND.
  3. Add _serialize_span_payload helper — serialises the FunctionSpanData input/output reliably:
    • string passthrough,
    • model_dump() for pydantic models (e.g. Agents SDK ToolOutputText) and for lists of pydantic models,
    • json.dumps(..., default=str) fallback,
    • str() as last resort.
  4. Gated on trace_content = should_send_prompts() so content-sensitive deployments that opt out of prompt capture also get the tool payload suppressed.

Tests

Extended the existing VCR-backed test_agent_with_function_tool_spans to assert:

  • GEN_AI_TOOL_CALL_ARGUMENTS / GEN_AI_TOOL_CALL_RESULT are set and non-empty.
  • TRACELOOP_ENTITY_INPUT / TRACELOOP_ENTITY_OUTPUT are set and match the GenAI pair.
  • Tool arguments reference the query's city name (end-to-end sanity).

All 10 tests in tests/test_openai_agents.py pass locally. uv run ruff check clean.

Notes

Part of a small series of openai-agents parity fixes (#4061 cached_tokens + reasoning_tokens, #4062 tool span type + duration). Each stands alone off main and can be merged in any order.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fc8df7b5-bbfe-4f10-872c-2ec2a0d29b66

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hansmire hansmire force-pushed the fix/openai-agents-tool-input-output branch from c18ada4 to 14fc3ac Compare April 29, 2026 03:06
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Max Hansmire seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

The `FunctionSpanData` emitted by the OpenAI Agents SDK for every
function-tool invocation carries the tool's arguments (`input`) and
return value (`output`).  This instrumentor wasn't reading either
— the resulting tool spans only carried the tool name and type, so
downstream trace UIs (Braintrust in particular) had no way to render
what actually happened in the tool call.

The OpenAI Agents SDK's own native tracing processor (e.g. Braintrust's
`BraintrustTracingProcessor`, `_function_log_data` at
`braintrust/wrappers/openai.py:158-162`) logs both fields directly;
OTel-based integrations were the outlier.

This change reads `FunctionSpanData.input` / `.output` in `on_span_end`
and emits them as the standard OTel GenAI semconv attributes:

  * `gen_ai.tool.call.arguments`  (via `GEN_AI_TOOL_CALL_ARGUMENTS`)
  * `gen_ai.tool.call.result`     (via `GEN_AI_TOOL_CALL_RESULT`)

Serialisation: if either value is already a string it's passed through,
otherwise it's `json.dumps`-ed with a `str()` fallback for
non-JSON-serialisable payloads.

Gated on `should_send_prompts()` so that content-sensitive deployments
opting out of prompt capture also get the tool payload suppressed.

Includes a regression assertion in the existing VCR-backed
`test_agent_with_function_tool_spans` integration test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants