feat: Screenshot tool with DXCam backend reporting and UIAutomation hang fix#104
Open
yasuhirofujii-medley wants to merge 4 commits intoCursorTouch:mainfrom
Open
Conversation
added 4 commits
March 13, 2026 09:25
get_screenshot() で使用されたバックエンド (dxcam/pillow) を追跡し、 DesktopState.screenshot_backend に格納。 レスポンステキストに 'Screenshot Backend: dxcam/pillow' 行を追加。 Control Node 側でこの情報をパースしてログに表示することで、 DirectX キャプチャが有効かどうかをリモートから確認可能にする。
use_ui_tree=False (Screenshot tool) の場合、get_controls_handles / get_windows / get_active_window をスキップ。 これらの UIAutomation API はアプリ起動中にハングする可能性があり、 Screenshot が 47 秒以上ブロックされるケースがあった。
uv cache fetch corrupted multi-byte (Japanese) characters in comments, causing newlines to be swallowed and merging the if-statement into the comment line, resulting in IndentationError on startup.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a dedicated Screenshot tool for fast screenshot-only capture, reports the capture backend (DXCam/Pillow) in the response, and skips expensive UIAutomation window enumeration in the Screenshot fast path.
These changes build on top of the
use_ui_tree=Falsefast path introduced in PR #98.Why this is needed
1. Screenshot tool — dedicated fast capture endpoint (
65a9ed3)Problem: The existing
Snapshottool, even withuse_ui_tree=False, still carries the overhead of being a general-purpose tool. Callers who only need a screenshot have to specify multiple flags (use_vision=True,use_annotation=False,use_ui_tree=False). More importantly, there was no way to invoke a screenshot-only path with a simple, discoverable tool name.Solution: Added a new
Screenshottool that is purpose-built for fast screenshot capture:use_vision=True,use_annotation=False,use_ui_tree=Falsedisplayparameter (list of display indices) for multi-monitor selectiondisplayis specified (requirescapture_rect)Also added:
Desktop.parse_display_selection()for robust display parameter handlingDesktop.get_display_union_rect()for computing the capture region from display indices_capture_desktop_state()helper to deduplicate Snapshot/Screenshot implementationWINDOWS_MCP_PROFILE_SNAPSHOTenv var for per-stage timing instrumentation2. Capture backend reporting (
5484e46)Problem: When debugging screenshot performance, there was no way to tell from the tool response whether DXCam (DirectX, ~10ms) or Pillow (GDI, ~100ms) was used for capture. This made it difficult to confirm that DXCam was actually being activated.
Solution: The
get_screenshot()method now tracks the backend used (self._last_screenshot_backend), and the response includes aScreenshot Backend: dxcamorScreenshot Backend: pillowline. TheDesktopStatedataclass carries ascreenshot_backendfield.3. Skip UIAutomation window enumeration for Screenshot tool (
5b22d1b,3d751df)Problem:
Desktop.get_state()unconditionally calledget_controls_handles(),get_windows(), andget_active_window()— even whenuse_ui_tree=False(Screenshot tool). These are UIAutomation API calls that enumerate windows via COM/WM messages. When an application is launching and not responding to window messages (e.g., showing a splash screen), these calls hang for tens of seconds (observed: 47 seconds for a single screenshot).This is the same class of problem that PR #98 addressed for tree capture, but the window enumeration calls were left in place because the Snapshot response includes window metadata. For the Screenshot tool, however, this metadata is not needed — the purpose is strictly to capture the screen image as fast as possible.
Solution: When
use_ui_tree=False,get_state()now skips all three UIAutomation window enumeration calls and returns empty window lists. This eliminates the hang entirely for the Screenshot path.The comment explaining this was initially written in Japanese, which caused an encoding corruption issue when
uvfetched the package from GitHub — multi-byte characters were mangled, newlines were swallowed, and anifstatement was merged into a comment line, producing anIndentationErroron startup. The comment was rewritten in English to avoid this.Changes
src/windows_mcp/__main__.pyScreenshottool withdisplayparameter_capture_desktop_state()shared helper (used by both Snapshot and Screenshot)_snapshot_profile_enabled()and_as_bool()helpers_build_snapshot_response()to deduplicate response constructionScreenshot Backend:line when availablesrc/windows_mcp/desktop/service.pyget_state(): Skipget_controls_handles/get_windows/get_active_windowwhenuse_ui_tree=Falseget_screenshot(): Track_last_screenshot_backend(dxcam/pillow)parse_display_selection()for display parameter validationget_display_union_rect()for computing display capture regionWINDOWS_MCP_PROFILE_SNAPSHOT=1src/windows_mcp/desktop/views.pyscreenshot_backend: str | Nonefield toDesktopStatesrc/windows_mcp/tree/service.pyscreen_boxproperty (used as fallback root box when UI tree is skipped)tests/test_snapshot_display_filter.pyparse_display_selection()use_ui_tree=Falsetree skip +use_domvalidationBehavior
Default behavior (no breaking changes)
Snapshottool continues to work exactly as beforeNew Screenshot tool
{ "tool": "Screenshot", "display": [0] }Returns a fast screenshot with DXCam backend (when available), no UI tree, no window enumeration.
Performance impact
Testing
python -m pytest -q tests/test_snapshot_display_filter.py # 11 passed