feat(openai-agents): record response-identification metadata on LLM spans#4065
feat(openai-agents): record response-identification metadata on LLM spans#4065hansmire wants to merge 1 commit intotraceloop:mainfrom
Conversation
…pans
The Responses API `Response` object carries several fields that downstream
trace backends rely on for turn chaining, model-version debugging and
reasoning/service-tier visibility, but that this instrumentor dropped on
the floor. Concretely:
* `response.id` — `gen_ai.response.id`
* `response.model` — `gen_ai.response.model`
(kept existing `gen_ai.request.model` for back-compat)
* `response.status` — `gen_ai.response.status`
* `response.previous_response_id`
— `gen_ai.request.previous_response_id`
* `response.service_tier` — `gen_ai.openai.request.service_tier`
* `response.reasoning.effort` — `gen_ai.request.reasoning_effort`
* `response.reasoning.summary` — `gen_ai.request.reasoning_summary`
For comparison, Braintrust's native Agents SDK processor surfaces the
full `response.model_dump(exclude={"input","output","metadata","usage"})`
on every LLM span — which is how its UI shows turn-by-turn chains and
the exact model version that served each request. Previously
openllmetry's equivalent span carried only `temperature`, `top_p`,
`max_tokens` and a conflated `gen_ai.request.model`.
All additions are defensive: when a field is missing / None on the
response, no attribute is emitted (no stringified "None" values).
Fields are set via existing OTel semconv constants where available
(`GEN_AI_RESPONSE_ID`, `GEN_AI_RESPONSE_MODEL`,
`GEN_AI_OPENAI_REQUEST_SERVICE_TIER`, `LLM_REQUEST_REASONING_EFFORT`,
`LLM_REQUEST_REASONING_SUMMARY`) and as string literals for the two
fields without published constants yet (`gen_ai.response.status`,
`gen_ai.request.previous_response_id`).
Includes two direct unit tests on `_extract_response_attributes` that
pin the attribute mapping contract and guard against regressions when
a field is absent.
|
Max Hansmire seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
The Responses API
Responseobject carries several fields that downstream trace backends rely on for turn chaining, model-version debugging, and reasoning / service-tier visibility, but that this instrumentor dropped on the floor. This PR plumbs them through to OTel span attributes on every LLM span.response.idgen_ai.response.idresponse.modelgen_ai.response.model(keptgen_ai.request.modelfor back-compat)response.statusgen_ai.response.statusresponse.previous_response_idgen_ai.request.previous_response_idresponse.service_tiergen_ai.openai.request.service_tierresponse.reasoning.effortgen_ai.request.reasoning_effortresponse.reasoning.summarygen_ai.request.reasoning_summaryAll additions are defensive: when a field is missing / None on the response, no attribute is emitted (no stringified
"None"values — pinned by a regression test).Before / After
Same 529 eval, same agent, same LLM-span detail panel in the Braintrust trace UI:
Before the LLM span's Metadata panel contains only
gen_ai.request.model,temperature,top_p,usage.*— no response identification, no service tier, no reasoning config.After the panel additionally renders
gen_ai.response.id,gen_ai.response.model,gen_ai.response.status,gen_ai.openai.request.service_tier, andgen_ai.request.reasoning_effort— exposing turn chain, served-model version, and reasoning/service-tier settings for debugging.Why
Braintrust's own native Agents SDK processor surfaces the full
response.model_dump(exclude={"input","output","metadata","usage"})on every LLM span — which is how its UI shows turn-by-turn chains and the exact model version that served each request. Previously openllmetry's equivalentopenai.responsespan carried onlytemperature,top_p,max_tokensand a conflatedgen_ai.request.model. Trace backends couldn't:response.id/previous_response_id) — critical for debugging agents that useauto_previous_response_id=True.gpt-5.4-2026-03-05was lost).Constants
OTel semconv constants are used where available:
GenAIAttributes.GEN_AI_RESPONSE_IDGenAIAttributes.GEN_AI_RESPONSE_MODELGenAIAttributes.GEN_AI_OPENAI_REQUEST_SERVICE_TIERSpanAttributes.LLM_REQUEST_REASONING_EFFORTSpanAttributes.LLM_REQUEST_REASONING_SUMMARYTwo fields fall back to string literals because
semconv_aidoesn't publish a constant yet:gen_ai.response.statusgen_ai.request.previous_response_idHappy to add them upstream in
semconv_aiin a follow-up if preferred.Tests
Two new direct unit tests on
_extract_response_attributes(no VCR needed):test_extract_response_captures_response_identification_fields— feeds aSimpleNamespacewith every field set, asserts each maps to the correct span attribute with the expected value.test_extract_response_absent_fields_dont_set_attributes— regression guard for the None-passthrough branches.All 12 tests in
tests/test_openai_agents.pypass locally.uv run ruff checkclean.Notes
Part of a small series of
openai-agentsparity fixes (#4061 cached_tokens + reasoning_tokens, #4062 tool span type + duration, #4063 tool span input + output). Each stands alone offmainand can be merged in any order.