Skip to content

Latest commit

 

History

History
60 lines (39 loc) · 3.81 KB

File metadata and controls

60 lines (39 loc) · 3.81 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Windows-MCP is a Python MCP (Model Context Protocol) server that bridges AI LLM agents with the Windows OS, enabling direct desktop automation. It exposes 16 tools (App, Shell, Screenshot, Snapshot, Click, Type, Scroll, Move, Shortcut, Wait, Scrape, MultiSelect, MultiEdit, Clipboard, Process, Notification) via FastMCP.

Build & Development Commands

uv sync                    # Install dependencies
uv run windows-mcp         # Run the MCP server
ruff format .              # Format code
ruff check .               # Lint code
ruff check --fix .         # Lint and auto-fix
pytest                     # Run all tests (if tests/ exists)
pytest tests/test_foo.py   # Run a single test file

Package manager: UV (not pip). Python: 3.13+. Build backend: Hatchling.

Architecture

The codebase follows a layered service architecture under src/windows_mcp/:

Entry point__main__.py: Registers all 15 MCP tools on a FastMCP server instance. Uses an async lifespan to initialize Desktop, WatchDog, and Analytics services. Each tool function delegates to Desktop methods. The @with_analytics decorator wraps tools for telemetry.

Desktop servicedesktop/service.py: High-level orchestrator. Manages window operations (launch, resize, switch), screenshots, mouse/keyboard actions, and clipboard. Interfaces with Tree service for UI element discovery. desktop/views.py defines data models: DesktopState, Window, Size, BoundingBox, Status.

Tree servicetree/service.py: Captures the Windows accessibility tree from active and background windows. Identifies interactive elements and scrollable areas. Uses ThreadPoolExecutor for multi-threaded UI traversal. tree/views.py defines TreeElementNode, ScrollElementNode, TreeState. tree/config.py has control type configurations.

UIAutomation wrapperuia/: Low-level abstraction over the Windows UIAutomation COM API via comtypes. core.py wraps the main automation object, controls.py has control-specific logic, patterns.py wraps UIAutomation patterns, enums.py has COM enumerations, events.py handles event subscriptions.

WatchDogwatchdog/service.py: Runs in a separate thread monitoring UI focus changes via UIAutomation events. Notifies the Tree service of focus changes so the accessibility tree stays current.

Virtual Desktop Managervdm/core.py: Tracks which windows belong to which Windows virtual desktop (Win10/11).

Analyticsanalytics.py: Optional PostHog telemetry (disabled with ANONYMIZED_TELEMETRY=false env var). Tracks tool names and errors only, not arguments or outputs.

Code Style

  • Formatter/linter: Ruff (line length 100, double quotes)
  • Naming: PEP 8 — snake_case functions/variables, PascalCase classes, UPPER_CASE constants
  • Type hints required on function signatures
  • Google-style docstrings for public functions/classes

Key Design Details

  • Screenshots are capped to 1920x1080 for token efficiency
  • Mouse/keyboard input uses UIA (same coordinate space as BoundingRectangle; no DPI mismatch)
  • Screenshot is the preferred fast visual-context tool; Snapshot is the heavier path for UI element ids and DOM extraction
  • Browser detection (Chrome, Edge, Firefox) triggers special DOM extraction mode in Snapshot
  • Fuzzy string matching (thefuzz) is used for element name matching
  • UI element fetching has retry logic (THREAD_MAX_RETRIES=3 in tree service)
  • The server supports stdio, SSE, and streamable HTTP transports

Security Context

This server has full system access with no sandboxing. Tools like Shell and App can perform irreversible operations. The recommended deployment target is a VM or Windows Sandbox.