Skip to content

fix(screenshot): apply maxImageDimension cap to match get_window_state#1691

Open
hippoley wants to merge 1 commit into
trycua:mainfrom
hippoley:fix/screenshot-max-image-dimension
Open

fix(screenshot): apply maxImageDimension cap to match get_window_state#1691
hippoley wants to merge 1 commit into
trycua:mainfrom
hippoley:fix/screenshot-max-image-dimension

Conversation

@hippoley
Copy link
Copy Markdown
Contributor

@hippoley hippoley commented May 25, 2026

Problem (Bug 1 from #1592)

screenshot returns native Retina resolution (2× logical) while click coordinates are calibrated against the downscaled image from get_window_state. An agent using screenshot to see the screen and then calling click with those pixel coordinates misses every target on a Retina display.

Root cause: ScreenshotTool called captureWindow(windowID:format:quality:) without maxImageDimension (defaults to 0 = no resize). GetWindowStateTool passes config.maxImageDimension and registers the resize ratio in ImageResizeRegistry so ClickTool can reverse it.

Fix

Load ConfigStore.shared in ScreenshotTool and pass config.maxImageDimension to captureWindow, matching the GetWindowStateTool code path exactly.

Before: captureWindow(windowID:, format:, quality:) → native 2× resolution

After: captureWindow(windowID:, format:, quality:, maxImageDimension: config.maxImageDimension) → same downscaled resolution as get_window_state

Fixes #1592 (Bug 1)

Summary by CodeRabbit

  • Bug Fixes
    • Screenshots now respect maximum image dimension constraints from configuration, preventing the generation of excessively large images that could impact performance and resource usage.

Review Change Stack

Fixes trycua#1592 (Bug 1).

ScreenshotTool called captureWindow(windowID:format:quality:) without
maxImageDimension, so it defaulted to 0 (no resize) and returned native
Retina resolution (2x logical). GetWindowStateTool passes
config.maxImageDimension and registers the resize ratio in
ImageResizeRegistry so ClickTool can reverse it.

An agent using screenshot to see the screen and click with those pixel
coordinates would miss every target on a Retina display because the
coordinate spaces were mismatched.

Fix: load ConfigStore.shared and pass config.maxImageDimension to
captureWindow, matching the get_window_state code path exactly.
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 25, 2026

@nishantpurohit04 is attempting to deploy a commit to the Cua Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 31810e3c-949d-486f-9291-e6f4eab3a7ed

📥 Commits

Reviewing files that changed from the base of the PR and between 8137a3d and 1890539.

📒 Files selected for processing (1)
  • libs/cua-driver/swift/Sources/CuaDriverServer/Tools/ScreenshotTool.swift

📝 Walkthrough

Walkthrough

Screenshot tool now loads shared configuration via ConfigStore.shared.load() and passes config.maxImageDimension into captureWindow, applying the same dimension cap used by get_window_state. Previously, screenshot was called without a cap parameter, returning native-resolution images.

Changes

Screenshot Config-Based Dimension Capping

Layer / File(s) Summary
Config-based screenshot dimension capping
libs/cua-driver/swift/Sources/CuaDriverServer/Tools/ScreenshotTool.swift
screenshot method loads ConfigStore.shared and threads config.maxImageDimension into the captureWindow call, applying the same dimension cap as get_window_state to align image resolutions and fix coordinate space mismatch with click tool.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

Possibly related PRs

  • trycua/cua#1663: Both PRs align on the screenshot max_image_dimension behavior — this Swift PR threads config.maxImageDimension into captureWindow, while the companion PR updates the Rust-side DriverConfig::default().max_image_dimension to 1568.

Suggested reviewers

  • ddupont808

Poem

🐰 A screenshot now respects the config's wisdom,
No more native pixels running in schism,
Click and capture, in harmony they dance,
Coordinates aligned, the agent's got a chance! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: applying maxImageDimension cap to screenshot to align with get_window_state behavior, which is the core objective of the PR.
Linked Issues check ✅ Passed The PR addresses all coding requirements from issue #1592 Bug 1: loading ConfigStore, passing maxImageDimension to captureWindow, and aligning screenshot resolution with get_window_state.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the screenshot maxImageDimension issue; no unrelated modifications to other files or features are present.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

screenshot tool returns native-resolution images incompatible with click coordinate system (+ two related bugs)

1 participant