ci: run BrowserOS test suites on PRs by shadowfax92 · Pull Request #514 · browseros-ai/BrowserOS

shadowfax92 · 2026-03-20T19:19:24Z

Summary

Enable the BrowserOS test workflow on pull requests and manual dispatch for packages/browseros-agent
Provision the latest Linux BrowserOS AppImage in CI with ETag-based caching
Run tools, integration, and SDK test suites on ubuntu-latest with headless mode and Linux-safe flags
Publish JUnit test reports as non-blocking PR checks for visibility into failures
Add extraArgs and headless support to test runtime so CI can pass --no-sandbox --disable-dev-shm-usage

Non-blocking by design

Test report checks show failures but do not block PRs (fail_on_failure: false). This lets the team see which tests pass/fail on Linux and fix them incrementally.

Current results

tools: 196/197 pass (1 flaky: get_active_page)
integration: all pass
sdk: has failures (pre-existing)

Test plan

CI runs on PR push and all 3 suites complete
JUnit reports published with per-suite check names
bun run typecheck passes for @browseros/server
git diff --check clean

🤖 Generated with Claude Code

shadowfax92 · 2026-03-20T19:19:35Z

@claude review

shadowfax92 · 2026-03-20T19:19:35Z

@greptileai review

claude · 2026-03-20T19:19:50Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

greptile-apps · 2026-03-20T19:23:14Z

Greptile Summary

This PR enables the BrowserOS test CI workflow on pull requests, switching from a disabled on: [] trigger to a full matrix run across tools, integration, and sdk suites on ubuntu-latest. It provisions the BrowserOS AppImage with ETag-based caching, wires a new extraArgs / headless pipeline through the TypeScript test helpers so that Linux-required flags (--no-sandbox --disable-dev-shm-usage) can be injected at runtime, and publishes JUnit reports as intentionally non-blocking PR checks.

Key changes:

.github/workflows/test.yml — activates the workflow on PRs, adds a 3-suite matrix, AppImage caching + wrapper, env file creation, and JUnit reporting
test-runtime.ts — adds extraArgs: string[] to TestRuntimePlan and a parseExtraArgs helper that reads BROWSEROS_TEST_EXTRA_ARGS
browser.ts — spreads extraArgs into the spawn args and replaces commented-out logging with a shouldLogBrowserOutput() guard (active in CI or when BROWSEROS_TEST_DEBUG=true)
setup.ts / with-browser.ts — pass extraArgs through to BrowserConfig
.gitignore — excludes .ci/ and test-results/ artifacts

Issues noted:

The parseExtraArgs helper splits on whitespace only, so arguments whose values contain spaces cannot be expressed through BROWSEROS_TEST_EXTRA_ARGS — worth noting for future extensibility
bun run test:cleanup runs under set +e with no exit-code check; a silent cleanup failure could leave stale state ahead of the test run without any workflow-level warning

Confidence Score: 4/5

Safe to merge — CI-only changes with minor style concerns but no blocking bugs
The extraArgs plumbing is correctly wired end-to-end, previously flagged issues (.env.development missing, non-blocking intent) are addressed, and the AppImage provisioning logic is sound. Two minor style concerns remain: the whitespace-only arg splitter and the silent cleanup failure under set +e, neither of which blocks test execution or produces incorrect results with the current configuration.
.github/workflows/test.yml — review the set +e + bun run test:cleanup interaction and the URL-fallback cache key behaviour (pre-existing)

Important Files Changed

Filename	Overview
.github/workflows/test.yml	New CI workflow enabling matrix test runs (tools/integration/sdk) on Ubuntu with AppImage provisioning, ETag-based caching, JUnit reporting, and intentionally non-blocking pass/fail. Two previously flagged issues (.env.development missing, silent CI green) are addressed. Minor concern: cleanup failure is silently swallowed under `set +e`.
packages/browseros-agent/apps/server/tests/helpers/test-runtime.ts	Adds `extraArgs: string[]` to `TestRuntimePlan` and a `parseExtraArgs` helper that splits `BROWSEROS_TEST_EXTRA_ARGS` on whitespace. Works correctly for the current CI flags; whitespace-only splitting is a minor future limitation.
packages/browseros-agent/apps/server/tests/helpers/browser.ts	Adds `extraArgs` to `BrowserConfig` and spreads them into the spawn args array before `--user-data-dir`. Replaces commented-out debug logging blocks with a clean `shouldLogBrowserOutput()` guard (CI or explicit debug flag). No issues.
packages/browseros-agent/apps/server/tests/helpers/setup.ts	Threads `extraArgs: runtimePlan.extraArgs` from `TestRuntimePlan` into `BrowserConfig`. Straightforward plumbing change, no issues.
packages/browseros-agent/apps/server/tests/helpers/with-browser.ts	Same `extraArgs` plumbing as `setup.ts` — adds the field to the `BrowserConfig` built inside `getOrCreateBrowser`. No issues.
packages/browseros-agent/.gitignore	Adds `.ci/` and `test-results/` to `.gitignore` to exclude CI artifacts. Correct and complete.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[PR Push / workflow_dispatch] --> B[Matrix strategy: tools · integration · sdk]
    B --> C[Checkout + Setup Bun + Install deps]
    C --> D[HEAD request to AppImage URL]
    D --> E{ETag or Last-Modified\npresent?}
    E -->|Yes| F[Derive cache key from header value]
    E -->|No| G[Fallback: hash of URL constant\n⚠ key never rotates]
    F --> H[actions/cache restore]
    G --> H
    H --> I{Cache hit?}
    I -->|Yes| J[Use cached BrowserOS.AppImage]
    I -->|No| K[curl download AppImage]
    K --> J
    J --> L[Write browseros wrapper script\nAPPIMAGE_EXTRACT_AND_RUN=1]
    L --> M[cp .env.example → .env.development]
    M --> N[set +e · bun run test:cleanup]
    N --> O[bun test suite with JUnit reporter\nBROWSEROS_TEST_HEADLESS=true\n--no-sandbox --disable-dev-shm-usage]
    O --> P[Capture exit_code]
    P --> Q{JUnit XML\nwritten?}
    Q -->|No| R[Generate fallback JUnit XML]
    Q -->|Yes| S[Upload artifact + Publish check\nfail_on_failure: false]
    R --> S
    S --> T[Write Step Summary\npass or fail]

Prompt To Fix All With AI

This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/tests/__helpers__/test-runtime.ts
Line: 27-35

Comment:
**`parseExtraArgs` cannot handle quoted arguments with embedded spaces**

The current implementation splits strictly on whitespace. This means any future flag whose value contains a space (e.g. `--user-agent="Mozilla/5.0 (compatible)"` or `--window-size=1280,720`) would be split incorrectly into multiple tokens.

For now `--no-sandbox --disable-dev-shm-usage` works fine, but a small note in the function's JSDoc or a shell-style tokeniser (splitting on spaces unless inside quotes) would make it safer to extend later:

```suggestion
function parseExtraArgs(value: string | undefined): string[] {
  if (!value) {
    return []
  }
  // Note: this splits on whitespace only; arguments whose values contain
  // spaces must be passed as separate entries in a future array env var.
  return value
    .split(/\s+/)
    .map((part) => part.trim())
    .filter((part) => part.length > 0)
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .github/workflows/test.yml
Line: 103-107

Comment:
**`bun run test:cleanup` failure is silently swallowed under `set +e`**

`set +e` is applied before `bun run test:cleanup`, so if the cleanup script exits non-zero (e.g. a dangling process that couldn't be killed, or a missing script), the job silently continues into the test run. This could leave stale state that causes spurious test failures, with no indication that cleanup was the root cause.

A targeted `set -e` around the cleanup — or at minimum checking its exit code — would surface the problem immediately:

```yaml
bun run test:cleanup || echo "::warning::test:cleanup exited with $? — stale state may affect results"
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: "chore: add CI artifa..."}

.github/workflows/test.yml

shadowfax92 · 2026-03-20T19:30:07Z

Follow-up on the AI review:

The .env.development concern is incorrect for this repo. packages/browseros-agent/apps/server/.env.development is committed and is the same file used in the local Bun/JUnit validation for this branch.
The workflow staying non-blocking is intentional for this first pass; that requirement came directly from the task.
AppImage caching is a reasonable follow-up if CI runtime becomes a problem, but I am not adding extra cache invalidation surface to the initial rollout.

shadowfax92 · 2026-03-20T19:35:22Z

Addressing the follow-up feedback in the latest commit:

CI now creates packages/browseros-agent/apps/server/.env.development from .env.example before the Bun test step.
The BrowserOS AppImage is now cached with a key derived from remote artifact metadata, with a deterministic fallback.
The workflow remains intentionally non-blocking.

shadowfax92 · 2026-03-20T19:35:31Z

@greptileai review

shadowfax92 · 2026-03-20T19:35:31Z

@claude review

claude · 2026-03-20T19:35:49Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

.github/workflows/test.yml

shadowfax92 · 2026-03-21T01:19:32Z

Addressing the latest failure mode in the new head commit:

The real Ubuntu issue was BrowserOS never reaching CDP, not the env bootstrap.
The test runtime now accepts CI-only extra browser flags and logs browser stderr in CI.
The workflow now sets --no-sandbox --disable-dev-shm-usage for test launches and publishes report checks under distinct names.

shadowfax92 · 2026-03-21T01:19:43Z

@claude review

shadowfax92 · 2026-03-21T01:19:43Z

@greptileai review

claude · 2026-03-21T01:19:57Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

.github/workflows/test.yml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

shadowfax92 · 2026-03-21T16:38:39Z

@greptileai review

shadowfax92 · 2026-03-21T16:38:40Z

@claude review

claude · 2026-03-21T16:38:58Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

…ution The JUnit report action creates check runs that GitHub associates with the CLA check suite instead of the Tests check suite, causing test reports to appear under "CLA Assistant" in the PR checks UI. Remove the action and rely on job status + step summary + artifact upload for test result visibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ci: run browseros tests on pull requests

f0c9aa6

github-actions bot added the CI/CD label Mar 20, 2026

greptile-apps bot reviewed Mar 20, 2026

View reviewed changes

.github/workflows/test.yml Show resolved Hide resolved

refactor: rework 0320-github_action_for_tests based on feedback

45775f3

greptile-apps bot reviewed Mar 20, 2026

View reviewed changes

.github/workflows/test.yml Show resolved Hide resolved

refactor: rework 0320-github_action_for_tests based on feedback

60660c5

greptile-apps bot reviewed Mar 21, 2026

View reviewed changes

.github/workflows/test.yml Show resolved Hide resolved

chore: add CI artifacts to .gitignore

55e31c1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

shadowfax92 merged commit ba78923 into main Mar 21, 2026
9 checks passed

Conversation

shadowfax92 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Non-blocking by design

Current results

Test plan

Uh oh!

shadowfax92 commented Mar 20, 2026

Uh oh!

shadowfax92 commented Mar 20, 2026

Uh oh!

claude bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

shadowfax92 commented Mar 20, 2026

Uh oh!

shadowfax92 commented Mar 20, 2026

Uh oh!

shadowfax92 commented Mar 20, 2026

Uh oh!

shadowfax92 commented Mar 20, 2026

Uh oh!

claude bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

shadowfax92 commented Mar 21, 2026

Uh oh!

shadowfax92 commented Mar 21, 2026

Uh oh!

shadowfax92 commented Mar 21, 2026

Uh oh!

claude bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

shadowfax92 commented Mar 21, 2026

Uh oh!

shadowfax92 commented Mar 21, 2026

Uh oh!

claude bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shadowfax92 commented Mar 20, 2026 •

edited

Loading

claude bot commented Mar 20, 2026 •

edited

Loading

greptile-apps bot commented Mar 20, 2026 •

edited

Loading

claude bot commented Mar 20, 2026 •

edited

Loading

claude bot commented Mar 21, 2026 •

edited

Loading

claude bot commented Mar 21, 2026 •

edited

Loading