fix(eval): include browser context in agent prompt by shivammittal274 · Pull Request #530 · browseros-ai/BrowserOS

shivammittal274 · 2026-03-23T12:11:26Z

Problem

The eval's single-agent was passing raw task.query as the prompt without browser context. The agent didn't know which page it was on after Phase 1 navigation, causing it to ask "which website?" and return immediately (0 steps, 2s duration).

This affected 4+ tasks that consistently scored 0% and contributed to flaky results on 21 tasks.

Fix

Use formatUserMessage() (same function used by chat-service.ts) to include browser context (active tab URL, title, page ID) in the prompt. The agent now sees:

## Browser Context
**Active Tab:** Tab 1 (Page ID: 1) - "University of Pennsylvania" (https://www.upenn.edu/)

---

<USER_QUERY>
Visit the "Contact Us" page and record the phone number...
</USER_QUERY>

Instead of just the raw query.

Changes

apps/eval/src/agents/single-agent.ts — use formatUserMessage(task.query, browserContext) instead of raw task.query
apps/server/src/agent/ai-sdk-agent.ts — re-export formatUserMessage from agent/tool-loop

The eval's single-agent was passing raw task.query as the prompt, without browser context (active tab URL, title). The agent didn't know which page it was on, causing it to ask "which website?" instead of browsing. Use formatUserMessage() (same as chat-service.ts) to include browser context in the prompt. Re-export formatUserMessage from agent/tool-loop.

greptile-apps · 2026-03-23T12:13:45Z

Greptile Summary

This PR fixes an eval regression where the SingleAgentEvaluator was sending only the raw task.query string as the agent prompt, leaving the agent unaware of which browser page it had been navigated to during Phase 1 setup. The agent would immediately ask "which website?" and exit with 0 steps, causing consistent 0% scores on 4+ tasks.

The fix mirrors exactly what chat-service.ts already does: call formatUserMessage(task.query, browserContext) to prepend a ## Browser Context section (active tab URL, title, page ID) before the <USER_QUERY> block. A companion change re-exports formatUserMessage from ai-sdk-agent.ts so it is accessible through the existing @browseros/server/agent/tool-loop import path already used by the eval package.

Key changes:

single-agent.ts — replaces prompt: task.query with formatUserMessage(task.query, browserContext), making the prompt consistent with production chat sessions.
ai-sdk-agent.ts — adds export { formatUserMessage } from './format-message' to expose the helper via the package's public tool-loop export path.
One non-blocking style note: capture.messageLogger.logUser is still called with the raw task.query (line 40), so the captured eval trajectory won't reflect the browser-context prefix that was actually sent to the model.

Confidence Score: 5/5

Safe to merge — targeted two-line fix that aligns eval prompt formatting with production, no functional regressions expected.
The change is minimal and well-scoped: it reuses an already-tested utility (formatUserMessage) in the same way as the production chat path. The re-export is a straightforward pass-through. No logic is changed in the agent itself, and the browserContext was already being constructed and passed to AiSdkAgent.create before this PR. The only open item is a P2 logging inconsistency that does not affect eval correctness or production code.
No files require special attention.

Important Files Changed

Filename	Overview
packages/browseros-agent/apps/eval/src/agents/single-agent.ts	Imports and applies formatUserMessage() to include browser context (active tab URL, title, page ID) in the agent's initial prompt, fixing 0-step failures when the agent didn't know its starting page.
packages/browseros-agent/apps/server/src/agent/ai-sdk-agent.ts	Adds a re-export of formatUserMessage from ./format-message so that it is accessible via the @browseros/server/agent/tool-loop import path used by the eval package.

Sequence Diagram

sequenceDiagram
    participant SE as SingleAgentEvaluator
    participant B as Browser (CDP)
    participant FU as formatUserMessage()
    participant AI as AiSdkAgent.toolLoopAgent

    SE->>B: listPages()
    B-->>SE: activePage (url, title, pageId)
    SE->>SE: build browserContext from activePage
    SE->>AI: AiSdkAgent.create({ browserContext })
    Note over SE: Inside withEvalTimeout callback
    SE->>FU: formatUserMessage(task.query, browserContext)
    FU-->>SE: formatted prompt with ## Browser Context header
    SE->>AI: generate({ prompt: formattedPrompt })
    AI-->>SE: result (text, toolCalls, toolResults)

Prompt To Fix All With AI

This is a comment left during a code review.
Path: packages/browseros-agent/apps/eval/src/agents/single-agent.ts
Line: 40

Comment:
**Logged message diverges from actual prompt**

`capture.messageLogger.logUser(task.query)` logs the raw query, but the agent now receives `formatUserMessage(task.query, browserContext)` which includes the `## Browser Context` header. This means the captured eval trajectory will show the bare query as the "user message", while the actual model input also contained the browser context block — making it harder to reproduce or debug a specific eval run from logs alone.

Consider logging the formatted prompt instead:

```suggestion
          const prompt = formatUserMessage(task.query, browserContext)
          await capture.messageLogger.logUser(prompt)
```

(moving line 40 to after `prompt` is built, and logging `prompt` rather than `task.query`)

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(eval): include browser context in ag..." | Re-trigger Greptile}

github-actions bot added the fix label Mar 23, 2026

shivammittal274 merged commit 94a1a70 into main Mar 23, 2026
9 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval): include browser context in agent prompt#530

fix(eval): include browser context in agent prompt#530
shivammittal274 merged 1 commit intomainfrom
fix/eval-browser-context

shivammittal274 commented Mar 23, 2026

Uh oh!

Uh oh!

greptile-apps bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shivammittal274 commented Mar 23, 2026

Problem

Fix

Changes

Uh oh!

Uh oh!

greptile-apps bot commented Mar 23, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant