Skip to content

test: fix BrowserOS tool test harness regressions#513

Merged
shadowfax92 merged 2 commits intomainfrom
fix/mar20-fix-tests
Mar 20, 2026
Merged

test: fix BrowserOS tool test harness regressions#513
shadowfax92 merged 2 commits intomainfrom
fix/mar20-fix-tests

Conversation

@shadowfax92
Copy link
Contributor

Summary

  • fix stale tool-test expectations for page actions and large page content output
  • make the filesystem timeout test deterministic under the current shell contract
  • replace the flaky live scroll assertion with a stable tool-contract test

Design

This patch stays entirely in the test layer. It updates test helpers and expectations to match current tool contracts, avoids a non-deterministic shell timeout command, and removes dependence on the installed BrowserOS binary's flaky mouseWheel CDP path while still validating the scroll tool contract.

Test plan

  • bun run test
  • bun --env-file=.env.development test apps/server/tests/tools/page-actions.test.ts
  • bun --env-file=.env.development test apps/server/tests/tools/observation.test.ts
  • bun --env-file=.env.development test apps/server/tests/tools/filesystem/bash.test.ts
  • bun --env-file=.env.development test apps/server/tests/tools/input.test.ts

Notes

  • bun run lint reports pre-existing warnings outside the touched files.
  • bun run typecheck is currently blocked by the existing missing generated file apps/agent/.wxt/tsconfig.json.

@github-actions github-actions bot added the test label Mar 20, 2026
@shadowfax92
Copy link
Contributor Author

@claude review

@shadowfax92
Copy link
Contributor Author

@greptileai review

@claude
Copy link

claude bot commented Mar 20, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR fixes four test regressions in the BrowserOS tool test harness, staying entirely in the test layer. It updates stale assertions to match current tool contracts (Content truncated message, workingDir field name), makes the filesystem bash timeout test deterministic by using exec sleep 30 (so proc.kill() targets the sleep process directly rather than its shell wrapper), and replaces a flaky live-browser scroll CDP assertion with a fast, deterministic mock-based contract test.

Key changes:

  • bash.test.ts: sleep 30exec sleep 30 to ensure the spawned process is replaced by sleep itself, making Bun.spawn's proc.kill() reliably terminate the right process
  • observation.test.ts: Updated the large-content assertion from 'Saved page content' to 'Content truncated' plus a path-inclusion check, matching the actual truncation message emitted by the tool
  • page-actions.test.ts: executionDirworkingDir in the createToolContext helper parameter, aligning with the ToolDirectories interface
  • input.test.ts: Replaces the withBrowser-based scroll test (which relied on a live mouseWheel CDP path and a window.scrollY delta assertion that could be flaky) with an inline browser stub that verifies the scroll tool's full contract: correct browser method invocation, response text, and structuredContent

Minor inconsistency noted: The rename of the createToolContext parameter in page-actions.test.ts was not propagated to the local variable names and test descriptions throughout the file, which still reference "execution directory."

Confidence Score: 5/5

  • This PR is safe to merge — all changes are confined to the test layer with no production code touched.
  • All four changes are straightforward test-layer fixes with clear rationale. The bash timeout fix (exec sleep 30) is a well-known POSIX idiom for deterministic process kill, the observation assertions correctly match the existing tool output, the workingDir rename aligns the test helper with the live interface, and the mock-based scroll test properly stubs only the two browser methods exercised by the scroll handler. The only non-critical finding is stale executionDir local variable names in page-actions.test.ts that the partial rename left behind.
  • No files require special attention; the stale local variable naming in page-actions.test.ts is cosmetic only.

Important Files Changed

Filename Overview
packages/browseros-agent/apps/server/tests/tools/filesystem/bash.test.ts Single-line fix changes sleep 30 to exec sleep 30 so proc.kill() targets the sleep process directly (via exec-replacement), making the timeout test deterministic and non-leaky.
packages/browseros-agent/apps/server/tests/tools/input.test.ts Replaces flaky live-browser scroll assertion with a synchronous mock-based contract test; correctly stubs only the two browser methods exercised by the scroll handler (getTabIdForPage and scroll), but the local variable rename in callers (executionDir) was not applied here — this file is fine.
packages/browseros-agent/apps/server/tests/tools/observation.test.ts Updates large-content assertions to match the tool's actual truncation message (Content truncated) and now also verifies the saved file path is present in the output text.
packages/browseros-agent/apps/server/tests/tools/page-actions.test.ts Renames the createToolContext helper parameter from executionDir to workingDir to match the ToolDirectories interface, but all call-site local variables and test descriptions still use the stale "executionDir" / "execution directory" terminology, creating a minor naming inconsistency.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[scroll test] --> B{Live browser\nor mock?}
    B -- Old: Live browser\nwithBrowser --> C[new_page → FORM_PAGE]
    C --> D[scroll CDP call]
    D --> E{scrollY\nincreased?}
    E -- Flaky on\nsome environments --> F[❌ Non-deterministic assertion]

    B -- New: Mock browser --> G[Inline Browser stub\ngetTabIdForPage + scroll]
    G --> H[executeTool scroll]
    H --> I[Assert text includes\n'Scrolled down']
    H --> J[Assert calls array\nmatches expected args]
    H --> K[Assert structuredContent\nmatches schema]
    I & J & K --> L[✅ Deterministic contract test]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/tests/tools/page-actions.test.ts
Line: 53

Comment:
**Stale `executionDir` local variable names after rename**

The `createToolContext` helper parameter was correctly renamed to `workingDir` to match the `ToolDirectories` interface, but all local variables in the test bodies and the test description at line 53 still use the old `executionDir` / "execution directory" terminology. This creates a naming inconsistency — a reader familiar with the `ToolContext` type will wonder why the tests call the thing they're constructing an "execution directory."

The same stale naming also appears at lines 54, 84, 113–114, 122–124. Consider renaming them for clarity:

```suggestion
  it('save_pdf resolves relative paths against the working directory by default', async () => {
```

And similarly rename `executionDir``workingDir` in the local variables at lines 54, 67, 72, 77, 79, 84, 98, 113–114, 122–124.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "test: fix browseros ..."

@shadowfax92 shadowfax92 merged commit be6ed22 into main Mar 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant