Skip to content

ci: run BrowserOS test suites on PRs#514

Merged
shadowfax92 merged 5 commits intomainfrom
feat/github-action-for-tests
Mar 21, 2026
Merged

ci: run BrowserOS test suites on PRs#514
shadowfax92 merged 5 commits intomainfrom
feat/github-action-for-tests

Conversation

@shadowfax92
Copy link
Contributor

@shadowfax92 shadowfax92 commented Mar 20, 2026

Summary

  • Enable the BrowserOS test workflow on pull requests and manual dispatch for packages/browseros-agent
  • Provision the latest Linux BrowserOS AppImage in CI with ETag-based caching
  • Run tools, integration, and SDK test suites on ubuntu-latest with headless mode and Linux-safe flags
  • Publish JUnit test reports as non-blocking PR checks for visibility into failures
  • Add extraArgs and headless support to test runtime so CI can pass --no-sandbox --disable-dev-shm-usage

Non-blocking by design

Test report checks show failures but do not block PRs (fail_on_failure: false). This lets the team see which tests pass/fail on Linux and fix them incrementally.

Current results

  • tools: 196/197 pass (1 flaky: get_active_page)
  • integration: all pass
  • sdk: has failures (pre-existing)

Test plan

  • CI runs on PR push and all 3 suites complete
  • JUnit reports published with per-suite check names
  • bun run typecheck passes for @browseros/server
  • git diff --check clean

🤖 Generated with Claude Code

@shadowfax92
Copy link
Contributor Author

@claude review

@shadowfax92
Copy link
Contributor Author

@greptileai review

@github-actions github-actions bot added the CI/CD label Mar 20, 2026
@claude
Copy link

claude bot commented Mar 20, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR enables the BrowserOS test CI workflow on pull requests, switching from a disabled on: [] trigger to a full matrix run across tools, integration, and sdk suites on ubuntu-latest. It provisions the BrowserOS AppImage with ETag-based caching, wires a new extraArgs / headless pipeline through the TypeScript test helpers so that Linux-required flags (--no-sandbox --disable-dev-shm-usage) can be injected at runtime, and publishes JUnit reports as intentionally non-blocking PR checks.

Key changes:

  • .github/workflows/test.yml — activates the workflow on PRs, adds a 3-suite matrix, AppImage caching + wrapper, env file creation, and JUnit reporting
  • test-runtime.ts — adds extraArgs: string[] to TestRuntimePlan and a parseExtraArgs helper that reads BROWSEROS_TEST_EXTRA_ARGS
  • browser.ts — spreads extraArgs into the spawn args and replaces commented-out logging with a shouldLogBrowserOutput() guard (active in CI or when BROWSEROS_TEST_DEBUG=true)
  • setup.ts / with-browser.ts — pass extraArgs through to BrowserConfig
  • .gitignore — excludes .ci/ and test-results/ artifacts

Issues noted:

  • The parseExtraArgs helper splits on whitespace only, so arguments whose values contain spaces cannot be expressed through BROWSEROS_TEST_EXTRA_ARGS — worth noting for future extensibility
  • bun run test:cleanup runs under set +e with no exit-code check; a silent cleanup failure could leave stale state ahead of the test run without any workflow-level warning

Confidence Score: 4/5

  • Safe to merge — CI-only changes with minor style concerns but no blocking bugs
  • The extraArgs plumbing is correctly wired end-to-end, previously flagged issues (.env.development missing, non-blocking intent) are addressed, and the AppImage provisioning logic is sound. Two minor style concerns remain: the whitespace-only arg splitter and the silent cleanup failure under set +e, neither of which blocks test execution or produces incorrect results with the current configuration.
  • .github/workflows/test.yml — review the set +e + bun run test:cleanup interaction and the URL-fallback cache key behaviour (pre-existing)

Important Files Changed

Filename Overview
.github/workflows/test.yml New CI workflow enabling matrix test runs (tools/integration/sdk) on Ubuntu with AppImage provisioning, ETag-based caching, JUnit reporting, and intentionally non-blocking pass/fail. Two previously flagged issues (.env.development missing, silent CI green) are addressed. Minor concern: cleanup failure is silently swallowed under set +e.
packages/browseros-agent/apps/server/tests/helpers/test-runtime.ts Adds extraArgs: string[] to TestRuntimePlan and a parseExtraArgs helper that splits BROWSEROS_TEST_EXTRA_ARGS on whitespace. Works correctly for the current CI flags; whitespace-only splitting is a minor future limitation.
packages/browseros-agent/apps/server/tests/helpers/browser.ts Adds extraArgs to BrowserConfig and spreads them into the spawn args array before --user-data-dir. Replaces commented-out debug logging blocks with a clean shouldLogBrowserOutput() guard (CI or explicit debug flag). No issues.
packages/browseros-agent/apps/server/tests/helpers/setup.ts Threads extraArgs: runtimePlan.extraArgs from TestRuntimePlan into BrowserConfig. Straightforward plumbing change, no issues.
packages/browseros-agent/apps/server/tests/helpers/with-browser.ts Same extraArgs plumbing as setup.ts — adds the field to the BrowserConfig built inside getOrCreateBrowser. No issues.
packages/browseros-agent/.gitignore Adds .ci/ and test-results/ to .gitignore to exclude CI artifacts. Correct and complete.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[PR Push / workflow_dispatch] --> B[Matrix strategy: tools · integration · sdk]
    B --> C[Checkout + Setup Bun + Install deps]
    C --> D[HEAD request to AppImage URL]
    D --> E{ETag or Last-Modified\npresent?}
    E -->|Yes| F[Derive cache key from header value]
    E -->|No| G[Fallback: hash of URL constant\n⚠ key never rotates]
    F --> H[actions/cache restore]
    G --> H
    H --> I{Cache hit?}
    I -->|Yes| J[Use cached BrowserOS.AppImage]
    I -->|No| K[curl download AppImage]
    K --> J
    J --> L[Write browseros wrapper script\nAPPIMAGE_EXTRACT_AND_RUN=1]
    L --> M[cp .env.example → .env.development]
    M --> N[set +e · bun run test:cleanup]
    N --> O[bun test suite with JUnit reporter\nBROWSEROS_TEST_HEADLESS=true\n--no-sandbox --disable-dev-shm-usage]
    O --> P[Capture exit_code]
    P --> Q{JUnit XML\nwritten?}
    Q -->|No| R[Generate fallback JUnit XML]
    Q -->|Yes| S[Upload artifact + Publish check\nfail_on_failure: false]
    R --> S
    S --> T[Write Step Summary\npass or fail]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/tests/__helpers__/test-runtime.ts
Line: 27-35

Comment:
**`parseExtraArgs` cannot handle quoted arguments with embedded spaces**

The current implementation splits strictly on whitespace. This means any future flag whose value contains a space (e.g. `--user-agent="Mozilla/5.0 (compatible)"` or `--window-size=1280,720`) would be split incorrectly into multiple tokens.

For now `--no-sandbox --disable-dev-shm-usage` works fine, but a small note in the function's JSDoc or a shell-style tokeniser (splitting on spaces unless inside quotes) would make it safer to extend later:

```suggestion
function parseExtraArgs(value: string | undefined): string[] {
  if (!value) {
    return []
  }
  // Note: this splits on whitespace only; arguments whose values contain
  // spaces must be passed as separate entries in a future array env var.
  return value
    .split(/\s+/)
    .map((part) => part.trim())
    .filter((part) => part.length > 0)
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .github/workflows/test.yml
Line: 103-107

Comment:
**`bun run test:cleanup` failure is silently swallowed under `set +e`**

`set +e` is applied before `bun run test:cleanup`, so if the cleanup script exits non-zero (e.g. a dangling process that couldn't be killed, or a missing script), the job silently continues into the test run. This could leave stale state that causes spurious test failures, with no indication that cleanup was the root cause.

A targeted `set -e` around the cleanup — or at minimum checking its exit code — would surface the problem immediately:

```yaml
bun run test:cleanup || echo "::warning::test:cleanup exited with $? — stale state may affect results"
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "chore: add CI artifa..."

@shadowfax92
Copy link
Contributor Author

Follow-up on the AI review:

  • The .env.development concern is incorrect for this repo. packages/browseros-agent/apps/server/.env.development is committed and is the same file used in the local Bun/JUnit validation for this branch.
  • The workflow staying non-blocking is intentional for this first pass; that requirement came directly from the task.
  • AppImage caching is a reasonable follow-up if CI runtime becomes a problem, but I am not adding extra cache invalidation surface to the initial rollout.

@shadowfax92
Copy link
Contributor Author

Addressing the follow-up feedback in the latest commit:

  • CI now creates packages/browseros-agent/apps/server/.env.development from .env.example before the Bun test step.
  • The BrowserOS AppImage is now cached with a key derived from remote artifact metadata, with a deterministic fallback.
  • The workflow remains intentionally non-blocking.

@shadowfax92
Copy link
Contributor Author

@greptileai review

@shadowfax92
Copy link
Contributor Author

@claude review

@claude
Copy link

claude bot commented Mar 20, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@shadowfax92
Copy link
Contributor Author

Addressing the latest failure mode in the new head commit:

  • The real Ubuntu issue was BrowserOS never reaching CDP, not the env bootstrap.
  • The test runtime now accepts CI-only extra browser flags and logs browser stderr in CI.
  • The workflow now sets --no-sandbox --disable-dev-shm-usage for test launches and publishes report checks under distinct names.

@shadowfax92
Copy link
Contributor Author

@claude review

@shadowfax92
Copy link
Contributor Author

@greptileai review

@claude
Copy link

claude bot commented Mar 21, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@shadowfax92
Copy link
Contributor Author

@greptileai review

@shadowfax92
Copy link
Contributor Author

@claude review

@claude
Copy link

claude bot commented Mar 21, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

…ution

The JUnit report action creates check runs that GitHub associates with the
CLA check suite instead of the Tests check suite, causing test reports to
appear under "CLA Assistant" in the PR checks UI.

Remove the action and rely on job status + step summary + artifact upload
for test result visibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@shadowfax92 shadowfax92 merged commit ba78923 into main Mar 21, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant