feat: move agentic loop and session management to backend by alexinthesky · Pull Request #46 · Consensys/ask-o11y-plugin

alexinthesky · 2026-02-12T21:39:34Z

Note

High Risk
Introduces new backend chat execution, persistence, and SSE streaming paths and changes RBAC enforcement semantics, which can impact authorization and run/session correctness across orgs and deployments.

Overview
Moves the AI chat agentic loop into the Go backend via a new /api/agent/* API, including SSE event streaming, run tracking (in-memory or Redis), cancellation, and token-aware context trimming. The backend now calls grafana-llm-app’s OpenAI-compatible endpoint using a Grafana service-account token and executes MCP tool calls with execution-time RBAC enforcement.

Reworks RBAC from hardcoded tool-name lists to MCP ToolAnnotations (readOnlyHint) propagated from MCP servers, and adds proxy support for tool lookup plus per-server header injection + dial timeouts. Adds backend session CRUD/current-session tracking and persists assistant/tool-call results from streamed events.

Updates local/dev/test plumbing: adds server:full multi-org compose setup, optional Prometheus datasource provisioning env vars, switches E2E secrets from OPENAI_API_KEY to LLM_API_KEY, and serializes Playwright workers while adding a stop/queue UX in ChatInput for overlapping sends.

^{Written by Cursor Bugbot for commit 9d5540c. This will update automatically on new commits. Configure here.}

pkg/plugin/runstore_redis.go

pkg/plugin/runstore.go

pkg/plugin/runstore_redis.go

pkg/plugin/runstore.go

…ition **Issue 1: Playwright using 2 workers despite config setting workers: 1** Root cause: Top-level `fullyParallel: true` in playwright.config.ts was overriding project-level `workers: 1` settings for chromium-llm-tests. Fix: Remove top-level `fullyParallel: true` to respect per-project configuration. Projects can now properly control their own parallelism: - chromium-session-tests: workers: 1, fullyParallel: false - chromium-llm-tests: workers: 1, fullyParallel: false - chromium: workers: 6, fullyParallel: true (default) **Issue 2: Agent runs being canceled mid-stream (500 error after ~500ms)** Root cause: When a new session is created during `sendMessage()`, the session ID changes from null to the new ID. This triggers the reconnect useEffect cleanup, which aborts the AbortController *while the SSE stream is still active*, causing: ``` rpc error: code = Canceled desc = context canceled ``` Flow that caused the bug: 1. User sends message (sessionId is null) 2. Backend creates session, returns sessionId 3. Frontend calls `setCurrentSessionIdDirect(result.sessionId)` 4. sessionManager.currentSessionId changes (null → new ID) 5. Reconnect useEffect dependency triggers cleanup 6. Cleanup aborts AbortController 7. Active SSE stream in `reconnectToAgentRun` gets canceled 8. Send button never re-enables, E2E tests timeout Fix: In reconnect effect cleanup, only abort if `!isGenerating`. This prevents aborting the controller while a message is actively being streamed. The abort will still happen when truly needed (user navigates away, explicitly stops generation, or session changes while idle). **Impact:** - E2E tests will now properly run LLM tests with single worker - Agent runs will complete successfully without mid-stream cancellation - Send button will re-enable after responses complete **Note on LLM_API_KEY:** Secret is correctly configured and LLM calls are succeeding. The test failures were caused by the SSE abort race condition, not missing API key. Fixes #46 Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

pkg/plugin/plugin.go

- Fix double close on channel in RunBroadcaster unsubscribe - Fix broadcaster race condition in AppendEvent - Replace hardcoded rgba() colors with semantic Tailwind classes - Remove dead code for cookie-based LLM auth (Grafana 12 strips cookies) - Add unit tests for ReasoningIndicator component All tests passing: - Frontend unit tests: 552 passed - Backend Go tests: all passed - TypeScript type checking: passed - ESLint: passed (warnings in unmodified code only) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Fixes two medium-severity issues identified by cursor bot: 1. Cancelled runs with no events marked as failed - Check if run is still cancellable when no events produced - Set status to "cancelled" if run was cancelled early - Prevents status mismatch between endpoint response and stored status 2. Redis SubscribeAndSnapshot duplicate-event race condition - Add sequence numbers to SSEEvent for ordering - Implement atomic sequence tracking in RunStore (in-memory counter) - Implement atomic sequence tracking in RedisRunStore (Redis INCR) - Add client-side deduplication in agentClient (skip sequence <= lastSeen) - Prevents duplicate events when reconnecting to in-progress runs Low-severity issues deferred to follow-up issue #48: - Empty assistant message on cancelled runs - GetBroadcaster dead code removal All tests passing: - Frontend unit tests: 552 passed - Backend Go tests: all passed - TypeScript type checking: passed - ESLint: passed (warnings in unmodified code only) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

pkg/agent/llm_client.go

playwright.config.ts

provisioning/datasources/datasources.yaml

… test consolidation Move the agentic loop from the frontend to a Go backend implementation with SSE streaming, detached execution (survives browser tab close), and automatic reconnection. Includes multi-org support, RBAC enforcement at tool execution, token-aware context window management, and session persistence. Key changes: - Backend agentic loop (pkg/agent/) with LLM client, tool execution, context window - Detached execution with atomic subscribe+snapshot for reconnection - Session management moved from frontend to backend (in-memory + Redis stores) - Frontend simplified to thin HTTP client (backendSessionClient.ts) - Reasoning/thinking content dropped when tool calls are present - E2E tests consolidated from 17 files to 8 with all 36 tests passing (0 skipped) - All LLM-dependent tests use consistent wait patterns and fast prompts Co-Authored-By: Claude Opus 4.6 <[email protected]>

alexinthesky requested review from gespi1 and mlallai as code owners February 12, 2026 21:39

alexinthesky changed the title ~~Improve code structure and maintainability. (vibe-kanban)~~ refactor: move agentic loop and session management to backend Feb 12, 2026

cursor bot reviewed Feb 12, 2026

View reviewed changes

pkg/plugin/runstore_redis.go Show resolved Hide resolved

pkg/plugin/runstore.go Show resolved Hide resolved

pkg/plugin/runstore_redis.go Show resolved Hide resolved

pkg/plugin/runstore.go Show resolved Hide resolved