feat: add 9 advanced automation tools with DPI coordinate system support by Vaibhav-api-code · Pull Request #93 · CursorTouch/Windows-MCP

Vaibhav-api-code · 2026-03-08T04:53:21Z

Summary

Adds 9 new tools to Windows-MCP that expand its desktop automation capabilities, bringing it closer to feature parity with macOS automation solutions. All coordinate-accepting tools support a coordinate_system parameter ("physical" or "logical") for DPI-aware operation.

New Tools

Tool	Description	Key Features
CursorPosition	Get current mouse (x, y) coordinates	Read-only, no deps
PixelColor	Get RGB color at screen coordinates	Hex code + nearest named color (20-color palette)
KeyHold	Press/release keys independently (`down`/`up`)	40+ key names (shift, ctrl, alt, f1-f12, arrows, etc.)
ScreenInfo	Get screen dimensions and DPI scaling	Virtual screen size + scale factor
ScreenHighlight	Highlight a screen region with colored rectangle	GDI overlay with auto-cleanup, 4 colors
MousePath	Move mouse along a multi-point path	Bezier smoothing, configurable duration
ScreenReader	OCR - read text from screen region	Windows OCR (built-in) + pytesseract fallback
WaitForChange	Wait until screen region visually changes	Pixel-by-pixel comparison, configurable threshold
FindImage	Template matching - find image on screen	OpenCV-based, returns center coords + confidence

DPI Coordinate System

All 6 coordinate-accepting tools support a coordinate_system parameter:

"physical" (default) — raw pixel coordinates, no conversion
"logical" — coordinates are multiplied by the system DPI scale factor

Three internal helpers handle the conversion:

_to_physical(loc, system) — for [x, y] coordinates
_region_to_physical(region, system) — for [x, y, w, h] regions
_path_to_physical(path, system) — for [[x,y], ...] waypoint lists

Dependencies

No new required dependencies. All tools work with the existing pillow + pywin32 stack.

Optional dependency groups added to pyproject.toml:

pip install 'windows-mcp[vision]'  # opencv-python-headless, numpy
pip install 'windows-mcp[ocr]'     # pytesseract
pip install 'windows-mcp[all]'     # everything

Tools gracefully degrade with clear install instructions when optional deps are missing.

Code Changes

This PR is purely additive — no existing code is modified except one import line in service.py (adding _approximate_color_name to the utils import).

File	Changes
`src/windows_mcp/__main__.py`	+288 lines: 3 DPI helpers + 9 tool registrations
`src/windows_mcp/desktop/service.py`	+384 lines: 9 implementation methods + 2 constant dicts
`src/windows_mcp/desktop/utils.py`	+36 lines: color name lookup table + helper
`pyproject.toml`	+12 lines: optional dependency groups
`README.md`	+9 lines: new tools in tools table
`tests/`	+10 new test files (899 lines total)

Testing

10 comprehensive test files with 80+ test cases covering:

All 9 tools with success paths and error handling
DPI coordinate conversion (physical passthrough, logical scaling at 100%/150%/200%)
Edge cases: unknown keys, invalid coordinates, missing optional deps
GDI handle validation, BMP format handling, template matching

All tests use unittest.mock to avoid Windows-specific runtime dependencies.

Test Plan

All new tests pass with pytest tests/
No modifications to existing tests
Diff is purely additive (1528 insertions, 1 deletion for import update)
Code follows existing project patterns (@mcp.tool(), @with_analytics, Desktop class methods)
3 rounds of code review completed (addressed 2 CRITICAL, 2 HIGH, 4 MEDIUM issues)

🤖 Generated with Claude Code

Add cursor position, pixel color, key hold/release, screen info, highlight region, mouse path, OCR screen reader, wait-for-change, and find-image (template matching) tools. All coordinate-accepting tools support a `coordinate_system` parameter ("physical" or "logical") for DPI-aware operation. New optional dependencies for vision (opencv) and OCR (pytesseract) in pyproject.toml. Includes 10 comprehensive test files (80+ tests). Co-Authored-By: Claude Opus 4.6 <[email protected]>

Copilot

Pull request overview

This PR expands windows-mcp desktop automation by adding 9 new MCP tools and related Desktop service implementations, plus DPI-aware coordinate conversion helpers and optional dependency extras for OCR/vision features.

Changes:

Added 3 DPI coordinate conversion helpers (_to_physical, _region_to_physical, _path_to_physical) and registered 9 new MCP tools in __main__.py.
Implemented the 9 new tool backends in Desktop (cursor position, pixel color, key hold, screen info, highlight, mouse path, OCR, change detection, template matching).
Added color-name approximation utility, optional dependency groups (vision, ocr, all), README tool list updates, and a new test suite covering the new tools/helpers.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`src/windows_mcp/__main__.py`	Adds DPI coordinate conversion helpers and registers new MCP tools that wrap `Desktop` methods.
`src/windows_mcp/desktop/service.py`	Adds tool implementations in `Desktop`, plus VK/color constants for KeyHold/Highlight.
`src/windows_mcp/desktop/utils.py`	Adds named-color palette and `_approximate_color_name` helper for PixelColor output.
`pyproject.toml`	Adds optional dependency extras for vision (OpenCV/numpy) and OCR (pytesseract).
`README.md`	Documents the newly added tools in the README tool list.
`tests/test_cursor_position.py`	Tests CursorPosition behavior via mocked UIA cursor coordinates.
`tests/test_pixel_color.py`	Tests PixelColor output formatting and color name approximation helper.
`tests/test_key_hold.py`	Tests key hold/release behavior and VK map essentials.
`tests/test_screen_info.py`	Tests monitor parsing and fallback behavior for ScreenInfo.
`tests/test_highlight.py`	Tests ScreenHighlight input validation and color map presence.
`tests/test_mouse_path.py`	Tests mouse path validation and endpoint visitation behavior.
`tests/test_screen_reader.py`	Tests OCR flow, region capture bbox behavior, and error handling.
`tests/test_wait_for_change.py`	Tests change detection, timeout, invalid input, and baseline capture errors.
`tests/test_find_image.py`	Tests missing deps messaging, path/extension validation, and match/no-match flows.
`tests/test_coordinate_system.py`	Tests DPI conversion helper behavior for physical/logical coordinate modes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-08T05:02:21Z

src/windows_mcp/__main__.py

+def _to_physical(loc: list[int], coordinate_system: str) -> list[int]:
+    """Convert coordinates to physical space if needed.
+
+    Args:
+        loc: [x, y] coordinates.
+        coordinate_system: "physical" (no conversion) or "logical" (multiply by DPI scale).
+
+    Returns:
+        [x, y] in physical coordinates ready for pyautogui.
+    """
+    if coordinate_system == "logical":
+        if desktop is None:
+            raise RuntimeError("Desktop service is not initialized.")
+        scale = desktop.get_dpi_scaling()
+        return [round(loc[0] * scale), round(loc[1] * scale)]
+    return loc
+
+
+def _region_to_physical(region: list[int], coordinate_system: str) -> list[int]:
+    """Convert a region [x, y, width, height] to physical space if needed."""
+    if coordinate_system == "logical":
+        if desktop is None:
+            raise RuntimeError("Desktop service is not initialized.")
+        scale = desktop.get_dpi_scaling()
+        return [round(v * scale) for v in region]
+    return region
+
+
+def _path_to_physical(path: list[list[int]], coordinate_system: str) -> list[list[int]]:
+    """Convert a list of [x, y] waypoints to physical space if needed."""
+    if coordinate_system == "logical":
+        if desktop is None:
+            raise RuntimeError("Desktop service is not initialized.")
+        scale = desktop.get_dpi_scaling()
+        return [[round(p[0] * scale), round(p[1] * scale)] for p in path]
+    return path


In logical mode _to_physical and _path_to_physical index into loc / each waypoint without validating shape first. For invalid inputs (e.g., loc=[100] or a malformed waypoint), this raises IndexError and the tool wrapper returns a generic Error: list index out of range instead of the intended "loc must be [x, y]" / "waypoint must be [x, y]" messages. Consider validating lengths in the helpers (or before calling them) and raising a clear ValueError/returning an unchanged value so the downstream validation runs.

Copilot · 2026-03-08T05:02:22Z