Skip to content

fix(eval): include browser context in agent prompt#530

Merged
shivammittal274 merged 1 commit intomainfrom
fix/eval-browser-context
Mar 23, 2026
Merged

fix(eval): include browser context in agent prompt#530
shivammittal274 merged 1 commit intomainfrom
fix/eval-browser-context

Conversation

@shivammittal274
Copy link
Contributor

Problem

The eval's single-agent was passing raw task.query as the prompt without browser context. The agent didn't know which page it was on after Phase 1 navigation, causing it to ask "which website?" and return immediately (0 steps, 2s duration).

This affected 4+ tasks that consistently scored 0% and contributed to flaky results on 21 tasks.

Fix

Use formatUserMessage() (same function used by chat-service.ts) to include browser context (active tab URL, title, page ID) in the prompt. The agent now sees:

## Browser Context
**Active Tab:** Tab 1 (Page ID: 1) - "University of Pennsylvania" (https://www.upenn.edu/)

---

<USER_QUERY>
Visit the "Contact Us" page and record the phone number...
</USER_QUERY>

Instead of just the raw query.

Changes

  • apps/eval/src/agents/single-agent.ts — use formatUserMessage(task.query, browserContext) instead of raw task.query
  • apps/server/src/agent/ai-sdk-agent.ts — re-export formatUserMessage from agent/tool-loop

The eval's single-agent was passing raw task.query as the prompt,
without browser context (active tab URL, title). The agent didn't
know which page it was on, causing it to ask "which website?" instead
of browsing.

Use formatUserMessage() (same as chat-service.ts) to include browser
context in the prompt. Re-export formatUserMessage from agent/tool-loop.
@github-actions github-actions bot added the fix label Mar 23, 2026
@shivammittal274 shivammittal274 merged commit 94a1a70 into main Mar 23, 2026
9 of 11 checks passed
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 23, 2026

Greptile Summary

This PR fixes an eval regression where the SingleAgentEvaluator was sending only the raw task.query string as the agent prompt, leaving the agent unaware of which browser page it had been navigated to during Phase 1 setup. The agent would immediately ask "which website?" and exit with 0 steps, causing consistent 0% scores on 4+ tasks.

The fix mirrors exactly what chat-service.ts already does: call formatUserMessage(task.query, browserContext) to prepend a ## Browser Context section (active tab URL, title, page ID) before the <USER_QUERY> block. A companion change re-exports formatUserMessage from ai-sdk-agent.ts so it is accessible through the existing @browseros/server/agent/tool-loop import path already used by the eval package.

Key changes:

  • single-agent.ts — replaces prompt: task.query with formatUserMessage(task.query, browserContext), making the prompt consistent with production chat sessions.
  • ai-sdk-agent.ts — adds export { formatUserMessage } from './format-message' to expose the helper via the package's public tool-loop export path.
  • One non-blocking style note: capture.messageLogger.logUser is still called with the raw task.query (line 40), so the captured eval trajectory won't reflect the browser-context prefix that was actually sent to the model.

Confidence Score: 5/5

  • Safe to merge — targeted two-line fix that aligns eval prompt formatting with production, no functional regressions expected.
  • The change is minimal and well-scoped: it reuses an already-tested utility (formatUserMessage) in the same way as the production chat path. The re-export is a straightforward pass-through. No logic is changed in the agent itself, and the browserContext was already being constructed and passed to AiSdkAgent.create before this PR. The only open item is a P2 logging inconsistency that does not affect eval correctness or production code.
  • No files require special attention.

Important Files Changed

Filename Overview
packages/browseros-agent/apps/eval/src/agents/single-agent.ts Imports and applies formatUserMessage() to include browser context (active tab URL, title, page ID) in the agent's initial prompt, fixing 0-step failures when the agent didn't know its starting page.
packages/browseros-agent/apps/server/src/agent/ai-sdk-agent.ts Adds a re-export of formatUserMessage from ./format-message so that it is accessible via the @browseros/server/agent/tool-loop import path used by the eval package.

Sequence Diagram

sequenceDiagram
    participant SE as SingleAgentEvaluator
    participant B as Browser (CDP)
    participant FU as formatUserMessage()
    participant AI as AiSdkAgent.toolLoopAgent

    SE->>B: listPages()
    B-->>SE: activePage (url, title, pageId)
    SE->>SE: build browserContext from activePage
    SE->>AI: AiSdkAgent.create({ browserContext })
    Note over SE: Inside withEvalTimeout callback
    SE->>FU: formatUserMessage(task.query, browserContext)
    FU-->>SE: formatted prompt with ## Browser Context header
    SE->>AI: generate({ prompt: formattedPrompt })
    AI-->>SE: result (text, toolCalls, toolResults)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/eval/src/agents/single-agent.ts
Line: 40

Comment:
**Logged message diverges from actual prompt**

`capture.messageLogger.logUser(task.query)` logs the raw query, but the agent now receives `formatUserMessage(task.query, browserContext)` which includes the `## Browser Context` header. This means the captured eval trajectory will show the bare query as the "user message", while the actual model input also contained the browser context block — making it harder to reproduce or debug a specific eval run from logs alone.

Consider logging the formatted prompt instead:

```suggestion
          const prompt = formatUserMessage(task.query, browserContext)
          await capture.messageLogger.logUser(prompt)
```

(moving line 40 to after `prompt` is built, and logging `prompt` rather than `task.query`)

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(eval): include browser context in ag..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant