feat: add Yutori Navigator n1.5 as a computer-use (CUA) agent provider by lawrencechen98 · Pull Request #2194 · browserbase/stagehand

lawrencechen98 · 2026-06-05T21:52:37Z

Draft — opening early for visibility/feedback. Happy to adjust scope or split (see "Notes for reviewers").

Summary

Adds Yutori Navigator n1.5 as a computer-use agent provider, alongside the existing OpenAI / Anthropic / Google / Microsoft CUA clients. Navigator is a computer-use model (screenshot in, coordinate-based tool_calls out in a normalized 1000×1000 space) served via an OpenAI-compatible Chat Completions API at https://api.yutori.com/v1. Because it's OpenAI-compatible, this reuses Stagehand's existing openai dependency and the provider-agnostic V3CuaAgentHandler — no new dependencies, no handler-shape changes for other providers.

const agent = stagehand.agent({ mode: "cua", model: "yutori/n1.5-latest" });
await agent.execute({ instruction: "...", maxSteps: 30 });

Auth via YUTORI_API_KEY (or clientOptions.apiKey / baseURL). Ships the core tool set (browser_tools_core-20260403).

What's included

YutoriCUAClient — screenshot-per-turn loop; tool_call → AgentAction with 1000×1000 → viewport coordinate denormalization; role:"tool" results with a Current URL: suffix; request payload trimming (keep recent screenshots under ~9.5 MB); completion when no tool_calls; stop-and-summarize on max steps. Mirrors the Yutori Python SDK reference loop.
Provider registration — AgentProvider, AVAILABLE_CUA_MODELS, AgentType, and providerEnvVarMap (yutori → YUTORI_API_KEY); Navigator-specific ClientOptions (toolSet, disableTools, jsonSchema, userTimezone, userLocation) + the cloud API / OpenAPI schema.
Keyboard modifiers implemented generically: a new optional modifiers option on the understudy page.click() / page.scroll() that sets the CDP mouse-event modifier bitmask (reusable by any provider). Plus hold_key and refresh (via page.reload, with a faithful agent-replay step).
Evals — the local bench harness no longer builds an AI-SDK text client for CUA-only models (getAISDKLanguageModel has no provider for them and initV3 ignores it in CUA mode). General fix; also unblocks local evals for microsoft/fara-7b.
Tests — unit coverage for the client (action mapping, message/trajectory shape, structured output, error recovery, stop-and-summarize), the helpers (coordinate denorm/validation, key mapping, payload trimming), the handler (modifiers/hold/refresh + URL freshness), and API serialization; plus a usage example.

Testing

pnpm build (typecheck) + the new unit suites pass; prettier/eslint clean.
Verified live end-to-end against the real Navigator API (local headless Chrome): multi-step click/type/keypress tasks complete with correct DOM end-state.

Notes for reviewers

Core tool set only. The expanded/DOM tool set (extract_elements, find, set_element_value, execute_js) is intentionally a follow-up.
The cloud-API/OpenAPI additions expose Navigator config through the hosted API schema for consistency; happy to drop or split these (and/or the evals harness fix) into separate PRs if you'd prefer a smaller first PR.
mouse_down/mouse_up are disabled by default (no equivalent in the shared action handler; drag covers press-move-release).

Maintained by the Yutori team.

Summary by cubic

Adds the Yutori Navigator n1.5 computer-use model as a new provider via an OpenAI-compatible Chat Completions API, now defaulting to the expanded DOM tool set for richer page interaction.

New Features
- New yutori/n1.5-latest CUA model with YUTORI_API_KEY auth and options (toolSet, disableTools, jsonSchema, userTimezone, userLocation, temperature).
- Expanded tools (default): extract_elements, find, set_element_value, execute_js built on the a11y snapshot + deepLocator; coordinate tools can target a ref (resolved to on-screen center with scroll-into-view) and recover on stale refs.
- YutoriCUAClient: screenshot-per-turn loop, 1000×1000 coordinate mapping, payload trimming, per-tool results with current URL, stop-and-summarize on max steps (fully flow-logged), and structured parsed_json on AgentResult.output.
- Generic click/scroll modifiers via CDP bitmask (captured and replayed), hold_key delay, and refresh with a recorded replay step; API/OpenAPI exposes provider yutori and Navigator options; .env.example includes YUTORI_API_KEY; local eval harness skips AI-SDK text clients for CUA-only models; example and tests included.
Migration
- Set YUTORI_API_KEY (and optional baseURL), then use: stagehand.agent({ mode: "cua", model: "yutori/n1.5-latest" }).
- Optional model options: toolSet, disableTools, jsonSchema, userTimezone, userLocation, temperature (use toolSet: "browser_tools_core-20260403" for coordinate-only).

^{Written for commit fcfbdb1. Summary will update on new commits.}

changeset-bot · 2026-06-05T21:52:41Z

🦋 Changeset detected

Latest commit: fcfbdb1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@browserbasehq/stagehand	Patch
@browserbasehq/stagehand-evals	Patch
@browserbasehq/stagehand-server-v3	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

github-actions · 2026-06-05T21:52:50Z

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

Integrates Yutori's Navigator n1.5 computer-use model as a Stagehand CUA provider, mirroring the existing OpenAI/Anthropic/Google/Microsoft CUA clients. Navigator is OpenAI-compatible Chat Completions at https://api.yutori.com/v1 (screenshot in, coordinate tool_calls out in a normalized 1000x1000 space), so this reuses the existing `openai` dependency and the provider-agnostic CUA handler — no new dependencies. Usage: stagehand.agent({ mode: "cua", model: "yutori/n1.5-latest" }). Auth via YUTORI_API_KEY or clientOptions (apiKey/baseURL). Core tool set. - YutoriCUAClient: screenshot-per-turn loop; tool_call -> AgentAction with 1000x1000 coordinate denormalization; role:"tool" results with a current-URL suffix; payload trimming; completion when no tool calls; stop-and-summarize on max steps. Faithful to the Yutori Python SDK reference loop. - Provider registration (AgentProvider, AVAILABLE_CUA_MODELS, AgentType, providerEnvVarMap) and Navigator ClientOptions (toolSet/disableTools/ jsonSchema/userTimezone/userLocation), incl. the cloud API + OpenAPI schema. - Keyboard modifiers via a general page.click/scroll `modifiers` option (sets the CDP mouse-event modifiers bitmask); hold-key; refresh via page.reload with a faithful agent-replay step. - Evals: skip building an AI-SDK text client for CUA-only models in the local bench harness path (also unblocks microsoft/fara-7b). - Unit tests + usage example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cubic-dev-ai

2 issues found across 23 files

Confidence score: 3/5

There is some real merge risk: stopAndSummarize() in packages/core/lib/v3/agent/YutoriCUAClient.ts bypasses expected flowLogger instrumentation, which can reduce traceability and make agent behavior/debugging less reliable.
packages/core/lib/v3/handlers/v3CuaAgentHandler.ts has a concrete replay correctness concern—modifiers used during CUA action execution are not persisted, so replay can diverge for modifier-dependent actions.
Given a high-confidence medium/high-severity instrumentation gap plus a replay-divergence bug, this looks mergeable only with caution rather than a low-risk merge.
Pay close attention to packages/core/lib/v3/agent/YutoriCUAClient.ts and packages/core/lib/v3/handlers/v3CuaAgentHandler.ts - missing flow logging and non-persisted modifiers can cause observability and replay consistency issues.

Architecture diagram

sequenceDiagram
    participant User as User Code
    participant Stagehand as Stagehand Instance
    participant Agent as Agent Provider
    participant Client as YutoriCUAClient
    participant Handler as V3CuaAgentHandler
    participant Page as Understudy Page
    participant CDP as Chrome DevTools Protocol
    participant Navigator as Yutori Navigator API

    Note over User,Navigator: NEW: Yutori Navigator n1.5 CUA Agent Provider

    User->>Stagehand: stagehand.agent({ mode: "cua", model: "yutori/n1.5-latest" })
    Stagehand->>Agent: create provider (modelToAgentProviderMap)
    Agent->>Client: new YutoriCUAClient(type, model, instructions, clientOptions)
    alt API key missing
        Client-->>Agent: throw Error
    end
    Client->>Client: configure toolSet, disableTools, jsonSchema, userTimezone, userLocation
    Stagehand-->>User: agent instance

    User->>Stagehand: agent.execute({ instruction, maxSteps })
    Stagehand->>Handler: new V3CuaAgentHandler(..., client=YutoriCUAClient)

    Handler->>Client: setScreenshotProvider()
    Handler->>Client: setActionHandler()
    Handler->>Handler: capture screenshot (page.screenshot)
    Handler->>Client: setCurrentUrl(page.url())
    Client->>Client: build message history (system prompt + user instruction with location/timezone context)

    loop Step Loop (maxSteps)
        Client->>Client: clone messages for request
        Client->>Client: trimImagesToFit (drop old screenshots under ~9.5 MB, keep latest)
        Client->>Navigator: POST /v1/chat/completions (OpenAI-compatible)
        Note over Client,Navigator: Extra params: tool_set, disable_tools, json_schema
        Navigator-->>Client: response with tool_calls (1000x1000 normalized coordinates)
        Client->>Client: parse tool_calls from assistant message

        alt No tool_calls
            Client-->>Handler: return final result (completed)
        else Has tool_calls
            par For each tool_call
                Client->>Client: denormalizeCoordinates (1000→viewport pixels)
                Client->>Client: mapNavigatorKeyToPlaywright (Navigator keys→Playwright keys)
                Client->>Handler: actionHandler(action)
                alt Action type: click (with possible modifier)
                    Handler->>Page: click(x, y, { modifiers })
                    Page->>CDP: dispatchMouseEvent(..., modifiers bitmask)
                else Action type: scroll (with possible modifier)
                    Handler->>Page: scroll(x, y, deltaX, deltaY, { modifiers })
                    Page->>CDP: dispatchMouseEvent(..., modifiers bitmask)
                else Action type: keypress (with optional holdMs delay)
                    Handler->>Page: keyPress(key, { delay })
                else Action type: refresh
                    Handler->>Page: reload({ waitUntil: "load" })
                    Page->>CDP: Page.reload
                else Action type: type, goto, back, forward, wait, drag
                    Handler->>Page: execute action
                end
                alt Action succeeded
                    Page-->>Handler: success
                else Action threw error
                    Page-->>Handler: error
                    Handler->>Page: still update client URL (page.url())
                    Handler-->>Client: action result with [ERROR]
                end
            end
            Client->>Client: append tool result (role:"tool" + "Current URL:" suffix)
            Client->>Client: captureScreenshot() for next turn
        end

        alt Payload size > max bytes
            Client->>Client: trimImagesToFit (strip old screenshots, keep recent)
        end
    end

    alt Max steps reached (no completion)
        Client->>Client: formatStopAndSummarize(task)
        Client->>Navigator: final request with summarize prompt (no json_schema)
        Navigator-->>Client: summary text
        Client-->>Handler: return result (completed=false, summary message)
    end

    Handler-->>Stagehand: AgentResult (output may include parsed_json)
    Stagehand-->>User: execution result

_{Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic}

Address review: stopAndSummarize() made a direct chat.completions.create call without FlowLogger instrumentation. Wrap it with FlowLogger.logLlmRequest/logLlmResponse, mirroring predict(), so every direct Navigator LLM call is flow-logged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rolls Address review (P2): modifiers applied during CUA action execution were not captured in recorded agent-replay steps, so a cached replay re-ran a chorded click/scroll as a plain one. Thread modifiers through the selector-based path symmetrically with the coordinate path: - Add a shared `cdpModifierMask` helper (understudy/modifiers.ts); reuse it in Page (dedupes the previous private copy) and add modifier support to Locator.click via the CDP mouse-event modifiers bitmask. - Action gains optional `modifiers`; performUnderstudyMethod forwards them to the locator click; takeDeterministicAction passes action.modifiers. - The CUA handler records modifiers on the replay step for click (Action) and scroll (AgentReplayScrollStep); AgentCache re-applies them on replay. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…est) Enable the expanded Navigator tool set (browser_tools_expanded-20260403) and make it the default for yutori/n1.5-latest, on top of the core coordinate tools. - DOM tools backed by Stagehand's a11y snapshot + deepLocator: - extract_elements / find render the hybrid accessibility tree in Navigator's format, minting stable ref_N tokens (NavigatorRefRegistry). - set_element_value resolves a ref to its xpath and fills via deepLocator. - execute_js evaluates JS in the page (expression-first, body fallback). - ref-targeted coordinate tools: click/scroll/etc. may carry a `ref` instead of coordinates; it resolves to the element's on-screen center (deepLocator centroid, scroll-into-view), taking priority over model coordinates and falling back to them. A ref'd scroll scrolls the element into view. - The CUA handler supplies a generic page bridge (a11y snapshot + evaluate + elementCenter); all Navigator-specific logic stays in the client. - Unknown/stale refs return a recoverable error so the model re-extracts. Unit tests cover rendering/find/ref resolution + the four-tool dispatch, tool-set selection, scroll-into-view, and stale-ref handling; a YUTORI_API_KEY-gated integration spec exercises the tools live. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions Bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels Jun 5, 2026

lawrencechen98 force-pushed the yutori-navigator-cua-upstream branch from 931e0b7 to ce6c06e Compare June 5, 2026 22:15

lawrencechen98 marked this pull request as ready for review June 5, 2026 22:27

cubic-dev-ai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread packages/core/lib/v3/agent/YutoriCUAClient.ts

Comment thread packages/core/lib/v3/handlers/v3CuaAgentHandler.ts Outdated

lawrencechen98 and others added 3 commits June 9, 2026 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Yutori Navigator n1.5 as a computer-use (CUA) agent provider#2194

feat: add Yutori Navigator n1.5 as a computer-use (CUA) agent provider#2194
lawrencechen98 wants to merge 4 commits into
browserbase:mainfrom
yutori-ai:yutori-navigator-cua-upstream

lawrencechen98 commented Jun 5, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

changeset-bot Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lawrencechen98 commented Jun 5, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Testing

Notes for reviewers

Summary by cubic

Uh oh!

changeset-bot Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lawrencechen98 commented Jun 5, 2026 •

edited by cubic-dev-ai Bot

Loading

changeset-bot Bot commented Jun 5, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading