fix(eval): include browser context in agent prompt#530
Conversation
The eval's single-agent was passing raw task.query as the prompt, without browser context (active tab URL, title). The agent didn't know which page it was on, causing it to ask "which website?" instead of browsing. Use formatUserMessage() (same as chat-service.ts) to include browser context in the prompt. Re-export formatUserMessage from agent/tool-loop.
Greptile SummaryThis PR fixes an eval regression where the The fix mirrors exactly what Key changes:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant SE as SingleAgentEvaluator
participant B as Browser (CDP)
participant FU as formatUserMessage()
participant AI as AiSdkAgent.toolLoopAgent
SE->>B: listPages()
B-->>SE: activePage (url, title, pageId)
SE->>SE: build browserContext from activePage
SE->>AI: AiSdkAgent.create({ browserContext })
Note over SE: Inside withEvalTimeout callback
SE->>FU: formatUserMessage(task.query, browserContext)
FU-->>SE: formatted prompt with ## Browser Context header
SE->>AI: generate({ prompt: formattedPrompt })
AI-->>SE: result (text, toolCalls, toolResults)
Prompt To Fix All With AIThis is a comment left during a code review.
Path: packages/browseros-agent/apps/eval/src/agents/single-agent.ts
Line: 40
Comment:
**Logged message diverges from actual prompt**
`capture.messageLogger.logUser(task.query)` logs the raw query, but the agent now receives `formatUserMessage(task.query, browserContext)` which includes the `## Browser Context` header. This means the captured eval trajectory will show the bare query as the "user message", while the actual model input also contained the browser context block — making it harder to reproduce or debug a specific eval run from logs alone.
Consider logging the formatted prompt instead:
```suggestion
const prompt = formatUserMessage(task.query, browserContext)
await capture.messageLogger.logUser(prompt)
```
(moving line 40 to after `prompt` is built, and logging `prompt` rather than `task.query`)
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(eval): include browser context in ag..." | Re-trigger Greptile |
Problem
The eval's single-agent was passing raw
task.queryas the prompt without browser context. The agent didn't know which page it was on after Phase 1 navigation, causing it to ask "which website?" and return immediately (0 steps, 2s duration).This affected 4+ tasks that consistently scored 0% and contributed to flaky results on 21 tasks.
Fix
Use
formatUserMessage()(same function used bychat-service.ts) to include browser context (active tab URL, title, page ID) in the prompt. The agent now sees:Instead of just the raw query.
Changes
apps/eval/src/agents/single-agent.ts— useformatUserMessage(task.query, browserContext)instead of rawtask.queryapps/server/src/agent/ai-sdk-agent.ts— re-exportformatUserMessagefromagent/tool-loop