Feature Request: Native OpenTelemetry trace export for agent interactions

# Feature Request: Native OpenTelemetry trace export for agent interactions

## Problem

Kiro CLI and IDE perform complex agentic workflows — LLM calls, tool invocations, file operations, reasoning steps — but there's no way to export structured telemetry about these interactions. Users building AI-powered applications need to understand agent behavior, debug failures, and measure performance using the same observability tooling they use for the rest of their stack.

## Proposed Solution

Export [OTLP](https://opentelemetry.io/docs/specs/otel/protocol/) traces for Kiro agent sessions using the [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/).

### What to trace

| Span | Key Attributes |
|------|---------------|
| Agent session | `gen_ai.system`, `gen_ai.request.model`, session duration |
| LLM call | `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.request.temperature`, latency |
| Tool/MCP invocation | tool name, parameters (redacted), success/failure, duration |
| File operations | path, operation type, result |

### Configuration

Follow standard OTel SDK conventions — env vars only, zero config files:

```bash
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=kiro-cli
kiro chat
```

When `OTEL_EXPORTER_OTLP_ENDPOINT` is unset, no traces are exported (zero overhead by default).

### Example trace

```
[agent_session] kiro-cli chat (12.4s)                          — gen_ai.system: anthropic
  ├── [invoke_agent] Kiro Agent (12.1s)                        — gen_ai.request.model: claude-sonnet-4, gen_ai.agent.name: Kiro Agent
  │   ├── [gen_ai.chat] claude-sonnet-4 (2.1s)                — gen_ai.usage.input_tokens: 890, gen_ai.usage.output_tokens: 210, finish_reason: tool_calls
  │   ├── [execute_tool] fs_read (45ms)                        — gen_ai.tool.name: fs_read
  │   │   └── [tools/call] fs_read (38ms)                      — SPAN_KIND_CLIENT
  │   ├── [gen_ai.chat] claude-sonnet-4 (3.8s)                — gen_ai.usage.input_tokens: 1240, gen_ai.usage.output_tokens: 520, finish_reason: tool_calls
  │   ├── [execute_tool] execute_bash (1.2s)                   — gen_ai.tool.name: execute_bash
  │   │   └── [tools/call] execute_bash (1.1s)                 — SPAN_KIND_CLIENT
  │   ├── [gen_ai.chat] claude-sonnet-4 (4.2s)                — gen_ai.usage.input_tokens: 2100, gen_ai.usage.output_tokens: 380, finish_reason: stop
  │   └── [execute_tool] fs_write (12ms)                       — gen_ai.tool.name: fs_write
  │       └── [tools/call] fs_write (8ms)                      — SPAN_KIND_CLIENT
  └── [http send] response (5ms)
```

## Why this matters

1. **Debugging** — When Kiro makes unexpected changes or takes a wrong path, traces let users see exactly what happened: which model was called, what tools were invoked, and where time was spent.
2. **Composability** — Users running Kiro as part of larger agentic systems (CI pipelines, automated workflows) need traces that connect to their existing observability backends (Jaeger, Grafana, OpenSearch, etc.).
3. **Dogfooding** — AWS ships OpenTelemetry (ADOT, CloudWatch, X-Ray). Kiro should emit the same telemetry it helps users instrument.
4. **GenAI semconv adoption** — The OTel GenAI semantic conventions are stabilizing. Native support in a widely-used AI coding tool drives adoption and validates the spec.

## Prior Art
- [Claude Code Monitoring / OTLP Support](https://code.claude.com/docs/en/monitoring-usage)
- [OpenSearch Agent Health](https://opensearch.org/blog/opensearch-agent-health-open-source-observability-and-evaluation-for-ai-agents/) — open-source observability and evaluation for AI agents
- [OpenSearch observability-stack](https://github.com/opensearch-project/observability-stack) — OTel-native observability platform with first-class GenAI semantic convention support
- [Strands Agents SDK](https://github.com/strands-agents/sdk-python) — native OTel trace export for agent interactions
- [OpenLLMetry](https://github.com/traceloop/openllmetry) — auto-instrumentation for LLM frameworks
- [Arize Phoenix](https://github.com/Arize-ai/phoenix) — LLM observability with OTel-compatible traces

## Non-goals

- Custom UI or dashboard — users bring their own backend
- Logging/metrics export — traces first, expand later
- Always-on telemetry — opt-in only via env vars


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Native OpenTelemetry trace export for agent interactions #6319

Feature Request: Native OpenTelemetry trace export for agent interactions

Problem

Proposed Solution

What to trace

Configuration

Example trace

Why this matters

Prior Art

Non-goals

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Span	Key Attributes
Agent session	`gen_ai.system`, `gen_ai.request.model`, session duration
LLM call	`gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.request.temperature`, latency
Tool/MCP invocation	tool name, parameters (redacted), success/failure, duration
File operations	path, operation type, result

Feature Request: Native OpenTelemetry trace export for agent interactions #6319

Description

Feature Request: Native OpenTelemetry trace export for agent interactions

Problem

Proposed Solution

What to trace

Configuration

Example trace

Why this matters

Prior Art

Non-goals

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions