test: fix BrowserOS tool test harness regressions by shadowfax92 · Pull Request #513 · browseros-ai/BrowserOS

shadowfax92 · 2026-03-20T18:49:26Z

Summary

fix stale tool-test expectations for page actions and large page content output
make the filesystem timeout test deterministic under the current shell contract
replace the flaky live scroll assertion with a stable tool-contract test

Design

This patch stays entirely in the test layer. It updates test helpers and expectations to match current tool contracts, avoids a non-deterministic shell timeout command, and removes dependence on the installed BrowserOS binary's flaky mouseWheel CDP path while still validating the scroll tool contract.

Test plan

bun run test
bun --env-file=.env.development test apps/server/tests/tools/page-actions.test.ts
bun --env-file=.env.development test apps/server/tests/tools/observation.test.ts
bun --env-file=.env.development test apps/server/tests/tools/filesystem/bash.test.ts
bun --env-file=.env.development test apps/server/tests/tools/input.test.ts

Notes

bun run lint reports pre-existing warnings outside the touched files.
bun run typecheck is currently blocked by the existing missing generated file apps/agent/.wxt/tsconfig.json.

shadowfax92 · 2026-03-20T18:51:04Z

@claude review

shadowfax92 · 2026-03-20T18:51:04Z

@greptileai review

claude · 2026-03-20T18:51:20Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

greptile-apps · 2026-03-20T18:54:22Z

Greptile Summary

This PR fixes four test regressions in the BrowserOS tool test harness, staying entirely in the test layer. It updates stale assertions to match current tool contracts (Content truncated message, workingDir field name), makes the filesystem bash timeout test deterministic by using exec sleep 30 (so proc.kill() targets the sleep process directly rather than its shell wrapper), and replaces a flaky live-browser scroll CDP assertion with a fast, deterministic mock-based contract test.

Key changes:

bash.test.ts: sleep 30 → exec sleep 30 to ensure the spawned process is replaced by sleep itself, making Bun.spawn's proc.kill() reliably terminate the right process
observation.test.ts: Updated the large-content assertion from 'Saved page content' to 'Content truncated' plus a path-inclusion check, matching the actual truncation message emitted by the tool
page-actions.test.ts: executionDir → workingDir in the createToolContext helper parameter, aligning with the ToolDirectories interface
input.test.ts: Replaces the withBrowser-based scroll test (which relied on a live mouseWheel CDP path and a window.scrollY delta assertion that could be flaky) with an inline browser stub that verifies the scroll tool's full contract: correct browser method invocation, response text, and structuredContent

Minor inconsistency noted: The rename of the createToolContext parameter in page-actions.test.ts was not propagated to the local variable names and test descriptions throughout the file, which still reference "execution directory."

Confidence Score: 5/5

This PR is safe to merge — all changes are confined to the test layer with no production code touched.
All four changes are straightforward test-layer fixes with clear rationale. The bash timeout fix (exec sleep 30) is a well-known POSIX idiom for deterministic process kill, the observation assertions correctly match the existing tool output, the workingDir rename aligns the test helper with the live interface, and the mock-based scroll test properly stubs only the two browser methods exercised by the scroll handler. The only non-critical finding is stale executionDir local variable names in page-actions.test.ts that the partial rename left behind.
No files require special attention; the stale local variable naming in page-actions.test.ts is cosmetic only.

Important Files Changed

Filename	Overview
packages/browseros-agent/apps/server/tests/tools/filesystem/bash.test.ts	Single-line fix changes `sleep 30` to `exec sleep 30` so `proc.kill()` targets the sleep process directly (via exec-replacement), making the timeout test deterministic and non-leaky.
packages/browseros-agent/apps/server/tests/tools/input.test.ts	Replaces flaky live-browser scroll assertion with a synchronous mock-based contract test; correctly stubs only the two browser methods exercised by the scroll handler (`getTabIdForPage` and `scroll`), but the local variable rename in callers (`executionDir`) was not applied here — this file is fine.
packages/browseros-agent/apps/server/tests/tools/observation.test.ts	Updates large-content assertions to match the tool's actual truncation message (`Content truncated`) and now also verifies the saved file path is present in the output text.
packages/browseros-agent/apps/server/tests/tools/page-actions.test.ts	Renames the `createToolContext` helper parameter from `executionDir` to `workingDir` to match the `ToolDirectories` interface, but all call-site local variables and test descriptions still use the stale "executionDir" / "execution directory" terminology, creating a minor naming inconsistency.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[scroll test] --> B{Live browser\nor mock?}
    B -- Old: Live browser\nwithBrowser --> C[new_page → FORM_PAGE]
    C --> D[scroll CDP call]
    D --> E{scrollY\nincreased?}
    E -- Flaky on\nsome environments --> F[❌ Non-deterministic assertion]

    B -- New: Mock browser --> G[Inline Browser stub\ngetTabIdForPage + scroll]
    G --> H[executeTool scroll]
    H --> I[Assert text includes\n'Scrolled down']
    H --> J[Assert calls array\nmatches expected args]
    H --> K[Assert structuredContent\nmatches schema]
    I & J & K --> L[✅ Deterministic contract test]

Prompt To Fix All With AI

This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/tests/tools/page-actions.test.ts
Line: 53

Comment:
**Stale `executionDir` local variable names after rename**

The `createToolContext` helper parameter was correctly renamed to `workingDir` to match the `ToolDirectories` interface, but all local variables in the test bodies and the test description at line 53 still use the old `executionDir` / "execution directory" terminology. This creates a naming inconsistency — a reader familiar with the `ToolContext` type will wonder why the tests call the thing they're constructing an "execution directory."

The same stale naming also appears at lines 54, 84, 113–114, 122–124. Consider renaming them for clarity:

```suggestion
  it('save_pdf resolves relative paths against the working directory by default', async () => {
```

And similarly rename `executionDir` → `workingDir` in the local variables at lines 54, 67, 72, 77, 79, 84, 98, 113–114, 122–124.

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: "test: fix browseros ..."}

test: fix browseros tool test harness regressions

3087630

github-actions bot added the test label Mar 20, 2026

test: align working directory naming in page action tests

df2763f

shadowfax92 merged commit be6ed22 into main Mar 20, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: fix BrowserOS tool test harness regressions#513

test: fix BrowserOS tool test harness regressions#513
shadowfax92 merged 2 commits intomainfrom
fix/mar20-fix-tests

shadowfax92 commented Mar 20, 2026

Uh oh!

shadowfax92 commented Mar 20, 2026

Uh oh!

shadowfax92 commented Mar 20, 2026

Uh oh!

claude bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shadowfax92 commented Mar 20, 2026

Summary

Design

Test plan

Notes

Uh oh!

shadowfax92 commented Mar 20, 2026

Uh oh!

shadowfax92 commented Mar 20, 2026

Uh oh!

claude bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 20, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 20, 2026 •

edited

Loading