Operator console for AI agent fleets — runs, traces, cost budgets, regression detection, SLA scoring, and incident routing. The Datadog-shaped layer between agent platforms and SRE/SecOps teams.
The visual surface for the agentobserve fleet operations engine. Real-time-shaped UX showing what enterprise agent observability looks like — run-by-run latency, token cost, SLA adherence, anomaly flagging, trace timelines, and AI-assisted root-cause prompts.
Why this matters: production AI agents fail in ways traditional APM tools don't catch — infinite reasoning loops, hallucinated tool calls, drifting costs across model versions, silent SLA degradation. Datadog and New Relic don't speak agent. AgentObserve does.
observe.kineticgain.com — interactive preview of the operator console.
| Surface | What it shows |
|---|---|
| Dashboard | Active agent count, total runs, average SLA, total spend vs budget. The 30-second fleet health summary. |
| Trace List (sidebar) | Recent runs with status pills (running / completed / anomaly / failed), latency, and cost per call. |
| Trace Detail | Per-run breakdown: latency, tokens, SLA score, cost, the actual reasoning trace. |
| Smart Observability Node | One-click structured prompt for AI-assisted root-cause analysis. Copies the prompt to clipboard for paste into Claude / Gemini / ChatGPT. v0.2 wires this directly via backend proxy. |
| Cost & SLA charts | 24-hour rolling timeseries on spend and SLA adherence. |
Modern AI products ship agents fast. Most of them have zero observability — they trust the LLM, hope the cost stays low, and don't notice the SLA cliff until the customer complains. This dashboard is what agent observability looks like in production: a single-page operator surface that:
- Platform engineers can dock to a wall in their fleet ops war room
- CTOs can read to confirm spend is in budget
- Compliance reviewers can use as evidence the agents are being monitored
- Hiring managers review in 30 seconds during a portfolio scan
The "Smart Observability Node" panel showcases a working LLM-assisted analysis UX without exposing API keys to the browser. Instead of calling Gemini/Claude/OpenAI from the client (which would embed your key in the JS bundle), the panel:
- Builds a domain-specific structured analysis prompt for the selected trace
- Copies that prompt to your clipboard
- You paste into your LLM of choice — Claude, Gemini, ChatGPT — and get the answer
This is honest, secure, and keeps the dashboard buildable without backend infrastructure. Phase 2 wires the same flow through a backend proxy (Cloudflare Workers / Vercel Functions) so the analysis happens inline.
| Layer | Tech |
|---|---|
| Framework | React 19 + Vite 6 |
| Language | TypeScript 5.8 |
| Styling | Tailwind 4 (CSS variable theme) |
| Charts | Recharts |
| Animations | Motion |
| Icons | Lucide React |
| Date utilities | date-fns |
| Fonts | IBM Plex Sans / Mono / Serif |
| Theme | GitHub-style dark + blue accents — the SRE/observability semiotic (blue = monitoring, green = healthy, amber = warn, red = anomaly) |
git clone https://github.com/mizcausevic-dev/agentobserve-dashboard.git
cd agentobserve-dashboard
npm install
npm run devOpen http://localhost:3000.
npm run build # tsc + vite build → dist/
npm run preview # serve dist/ locally to test- Operator console UX (Dashboard / Traces / Analytics / API Keys tabs)
- Trace list with status pills and SLA scoring
- Trace detail panel with latency, cost, tokens, reasoning trace
- Smart Observability Node — copy-prompt pattern for AI-assisted analysis
- 24-hour cost and SLA charts
- AGPL-3.0 license + CI matrix on Node 20 / 22
- Live agent fleet ingestion — webhook endpoint or polling against OpenTelemetry / OpenLLMetry
- Backend proxy (Cloudflare Workers) for inline AI analysis (replaces clipboard pattern)
- Persistent run history (KV-backed)
- Real-time anomaly detection on cost / latency / SLA spikes
- Multi-agent fleet view (compare N agent versions side-by-side)
- Webhook / Slack / PagerDuty alerts on regression or budget breach
- Trace replay UI for incident postmortems
- Exportable PDF SLA reports
- SaaS tiers (Free per-fleet / Pro / Team)
- Enterprise feature: org-wide agent inventory + SOC 2 evidence collection
- White-label / on-prem deploys
This dashboard is the visual surface for agentobserve — the actual TypeScript implementation of the fleet ops engine. The two are deliberately split:
agentobserve= the engine (run collectors, SLA scorers, anomaly detectors, OpenTelemetry adapters)agentobserve-dashboard= the UX (this repo)
Run them together for a complete fleet ops experience, or run the dashboard standalone to showcase the UX.
AGPL-3.0-only. See LICENSE.
The AGPL means: you can fork, modify, and self-host this — but if you run a modified version as a service, you must publish your source. This protects the project's monetization runway while keeping it genuinely open-source for platform teams.
For commercial licensing inquiries (closed-source forks, white-label deployments, on-prem with proprietary modifications), contact: miz@kineticgain.com
Miz Causevic — Director of Web Engineering · Platform Architecture mizcausevic-dev.github.io · github.com/mizcausevic-dev · gv.kineticgain.com · mcp.kineticgain.com · rag.kineticgain.com
Connect: LinkedIn · Kinetic Gain · Medium · Skills