Skip to content

Feature Request: Native OpenTelemetry trace export for agent interactions #6319

@kylehounslow

Description

@kylehounslow

Feature Request: Native OpenTelemetry trace export for agent interactions

Problem

Kiro CLI and IDE perform complex agentic workflows — LLM calls, tool invocations, file operations, reasoning steps — but there's no way to export structured telemetry about these interactions. Users building AI-powered applications need to understand agent behavior, debug failures, and measure performance using the same observability tooling they use for the rest of their stack.

Proposed Solution

Export OTLP traces for Kiro agent sessions using the OpenTelemetry GenAI Semantic Conventions.

What to trace

Span Key Attributes
Agent session gen_ai.system, gen_ai.request.model, session duration
LLM call gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.request.temperature, latency
Tool/MCP invocation tool name, parameters (redacted), success/failure, duration
File operations path, operation type, result

Configuration

Follow standard OTel SDK conventions — env vars only, zero config files:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=kiro-cli
kiro chat

When OTEL_EXPORTER_OTLP_ENDPOINT is unset, no traces are exported (zero overhead by default).

Example trace

[agent_session] kiro-cli chat (12.4s)                          — gen_ai.system: anthropic
  ├── [invoke_agent] Kiro Agent (12.1s)                        — gen_ai.request.model: claude-sonnet-4, gen_ai.agent.name: Kiro Agent
  │   ├── [gen_ai.chat] claude-sonnet-4 (2.1s)                — gen_ai.usage.input_tokens: 890, gen_ai.usage.output_tokens: 210, finish_reason: tool_calls
  │   ├── [execute_tool] fs_read (45ms)                        — gen_ai.tool.name: fs_read
  │   │   └── [tools/call] fs_read (38ms)                      — SPAN_KIND_CLIENT
  │   ├── [gen_ai.chat] claude-sonnet-4 (3.8s)                — gen_ai.usage.input_tokens: 1240, gen_ai.usage.output_tokens: 520, finish_reason: tool_calls
  │   ├── [execute_tool] execute_bash (1.2s)                   — gen_ai.tool.name: execute_bash
  │   │   └── [tools/call] execute_bash (1.1s)                 — SPAN_KIND_CLIENT
  │   ├── [gen_ai.chat] claude-sonnet-4 (4.2s)                — gen_ai.usage.input_tokens: 2100, gen_ai.usage.output_tokens: 380, finish_reason: stop
  │   └── [execute_tool] fs_write (12ms)                       — gen_ai.tool.name: fs_write
  │       └── [tools/call] fs_write (8ms)                      — SPAN_KIND_CLIENT
  └── [http send] response (5ms)

Why this matters

  1. Debugging — When Kiro makes unexpected changes or takes a wrong path, traces let users see exactly what happened: which model was called, what tools were invoked, and where time was spent.
  2. Composability — Users running Kiro as part of larger agentic systems (CI pipelines, automated workflows) need traces that connect to their existing observability backends (Jaeger, Grafana, OpenSearch, etc.).
  3. Dogfooding — AWS ships OpenTelemetry (ADOT, CloudWatch, X-Ray). Kiro should emit the same telemetry it helps users instrument.
  4. GenAI semconv adoption — The OTel GenAI semantic conventions are stabilizing. Native support in a widely-used AI coding tool drives adoption and validates the spec.

Prior Art

Non-goals

  • Custom UI or dashboard — users bring their own backend
  • Logging/metrics export — traces first, expand later
  • Always-on telemetry — opt-in only via env vars

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions