Feature: Optional indexed search backend (C# search service)#18551
Feature: Optional indexed search backend (C# search service)#18551Wingingbump wants to merge 10 commits into
Conversation
Adds an opt-in fast-search backend behind a new ISearchProvider seam.
Default behavior is byte-identical to upstream; the new path is taken
only when FILES_SEARCH_PROVIDER=Indexed is set. Seeking maintainer
direction (see docs/proposal.md) before any upstream PRs.
Components:
- src/search-service/ Rust gRPC service: Tantivy filename index,
FindFirstFileExW + rayon enumerator, notify
watcher, PROCESS_MODE_BACKGROUND_BEGIN +
battery/fullscreen/load throttling. 12 tests.
- src/Files.SearchAbstraction/ ISearchProvider interface + DTOs.
- src/Files.LegacySearch/ Wraps Windows.Storage.Search/AQS.
- src/Files.IndexedSearch.Client/ gRPC client (TCP for v0).
- src/Files.App/.../SearchRouter.cs Drop-in for FolderSearch; routes
to indexed when in scope, falls back to legacy
on glob/AQS/library/service-down.
- tests/Files.Search.Bench/ 200-query harness with JSON output.
- tests/corpora/ Deterministic corpus generator.
Bench (5k smoke corpus): indexed is ~595x faster than legacy fallback
on substring queries. Big O analysis projects the gap to widen at
larger corpora (legacy is O(N) per query when the path isn't in
Windows Search Indexer's catalog; indexed is O(log N) always). See
docs/decisions/0003-bench-strategy-theoretical.md for the projection
methodology.
Build env: requires VS 2026 (v145 toolset for .NET 10).
Files.App.Launcher.vcxproj bumped to stdcpp20 + FilesLauncher.cpp uses
::towupper for C++20 compatibility (see project_build_env memory).
Replaces the Rust PoC sidecar with a pure C# Windows Service (Files.SearchService) that ships the same gRPC wire format and ISearchProvider abstraction, removing the Rust toolchain from the project's build matrix. Search side: - In-memory inverted + trigram filename index with atomic reference-swap publish; two-tier scoring + score-then-truncate. - USN journal enumeration when running as LocalSystem; DirectoryInfo walk fallback for dev mode. - ChangeWatcher + 250ms-debounced EventBatcher for live updates; overflow triggers a full rebuild without losing events. - ProcessThrottle: background priority + battery/fullscreen/CPU polling. - IndexPersistence: binary format with magic+version, warm-start reconcile against disk on service launch. - Kestrel gRPC over named pipe in packaged/SCM mode, TCP loopback in dev. Named pipe DACL grants AuthenticatedUsers ReadWrite | Synchronize — Synchronize is required for NamedPipeClientStream's async connect across the LocalSystem/user-session boundary, was the source of the prior UnauthorizedAccessException. App side: - SearchRouter replaces FolderSearch as the call-site seam; routes glob/ AQS/Home/Library to legacy, everything else to indexed when enabled. - SearchServiceManager.EnsureRunning bridges packaged (ServiceController) and dev (HKCU\Run + direct launch) startup paths. - UseIndexedSearch toggle in Settings → Advanced, stored via IGeneralSettingsService. Env var FILES_SEARCH_PROVIDER kept as a dev override. - desktop6:Service declaration in Package.appxmanifest with StartAccount=localSystem. Debug manifest omits it so VS can sideload without admin. - WindowsAppSdkDeploymentManagerInitialize=false to skip the DeploymentManager auto-init that was throwing REGDB_E_CLASSNOTREG on packaged launches; the Main+Singleton packages already ship with the framework dependency, so the auto-init was redundant. Tests: - tests/Files.Search.Correctness (92 tests pass): FileIndex, Tokenizer, Scorer, Persistence, CorpusCorrectness. - tests/Files.Search.Probe: console smoke harness with bench/query/check subcommands. - tests/Files.Search.Bench wired with naive-scan, legacy (AQS), and indexed providers; same .proto + same scope adapter for fair compare. Bench: bench-results/baseline.json pinned (50k 'small' corpus, indexed TTFR p50=11ms, p99=88ms, total p99=210ms). Legacy AQS measured at 5k (2025ms TTFR) — full-scale legacy run deferred per ADR 0003 since it's O(N) per query off-Indexer and produces no gate-relevant information. Docs: rewrote CLAUDE.md, search-roadmap.md, README; added csharp-search-service.md (architecture + file map). Deleted proposal.md, improvements.md, and ADR 0002 (Rust-specific and superseded). Removed src/search-service/ — the Rust PoC stays in branch history via commit 534d784. Validated end-to-end in packaged/SCM mode: Get-Service shows FilesSearchService running as LocalSystem; cross-context named-pipe connect from a non-elevated user returns 'connected OK'. Files.App itself has a pre-existing packaged-launch crash unrelated to search (silent exit before managed code) that needs a separate pass before release shipping; the search-service infrastructure is independently proven.
Files.App exits silently before managed code runs on packaged install. Root cause unknown — captured timeline, ruled-out hypotheses, and concrete next debug steps so this can be picked up cold. Also notes two build-side issues: - First clean-tree MSBuild pass fails manifest validation due to Condition="Exists(...)" on the service-staging Content glob; succeeds on second pass once SearchService output exists on disk. - v143 platform toolset on Files.App.Launcher.vcxproj not present locally on VS 2026 machines.
The AppxManifest declares an OutOfProcessServer at Files.App.Server\Files.App.Server.exe, but the csproj was only staging the .winmd metadata at the top level — the exe and its deps were never copied into the package's Files.App.Server\ subdir. Adds a Content Include block mirroring the SearchService staging pattern. Also corrects EntryPoint="windows.FullTrustApplication" (lowercase 'w') to "Windows.FullTrustApplication" on the desktop6:Service extension. Microsoft docs specify capital W for the service entrypoint string. Neither fix resolves the Win10 19045 packaged-activation failure (see docs/packaged-build-debug-notes.md), but both are real package correctness bugs.
…on scripts Comment fixes: - SearchServiceManager.LaunchIfNotRunning now describes the actual TCP-loopback collision pattern instead of mentioning named pipes. Dev mode uses FILES_SEARCH_SERVICE_URL with TCP loopback, not pipes. - SearchRouter._serviceAvailable documents that the racing reads/writes are intentionally unsynchronized (the worst case is two concurrent first-searches each issuing an idempotent health probe). scripts/dev-cycle.ps1: one-shot stop-service + uninstall + build + install + activate cycle. Handles the manifest-validation retry on a clean tree. Switches platform via -Platform x64|x86|arm64. scripts/debug-activation.ps1: bundles the kernel-process ETW trace, AppXDeploymentServer / AppModel-Runtime / TWinUI / Application-log filters, and WER state inspection we needed for the Win10 19045 packaged-launch diagnosis. Writes a per-run output directory with one file per source plus a summary.
CLAUDE.md and the local debug/dev-cycle PowerShell helpers were used during development but aren't useful to upstream maintainers. README loses the now-dangling CLAUDE.md link.
Reverts five stray local-dev artifacts on top of upstream files-community/Files without touching any search-service wiring or DeploymentManager bug-fix. - Package.appxmanifest: Publisher CN=Tommy -> CN=Files - Package.appxmanifest: strip UTF-8 BOM - Files.App.csproj: AppxBundlePlatforms restored to x86|x64|arm64 - Files.App.csproj: remove AppxPackageSigningEnabled + local cert thumbprint - Files.App.csproj: restore 3-line MSBuild element formatting
Remove fork README, revert Files.App.Launcher build-env workarounds (towupper / stdcpp20), and drop the .claude/ gitignore entry so the diff against upstream contains only the search-service feature.
| @@ -0,0 +1,30 @@ | |||
| syntax = "proto3"; | |||
There was a problem hiding this comment.
Does using Google's Protocol Buffers in gRPC have any benefit over JSON? IIRC, protobuf is not a requirement for gRPC, and I'd insist on not adding more dependencies.
Need a decision on this @yair100
| /// All structures match the Windows SDK definitions for USN_RECORD_V2 | ||
| /// and MFT_ENUM_DATA_V0 used by FSCTL_ENUM_USN_DATA. | ||
| /// </summary> | ||
| internal static partial class NativeMethods |
There was a problem hiding this comment.
Perhaps Files.App.CsWin32 should be used instead?
| { | ||
| "small" => ("small", 50_000, 40L * 1024), // ~2 GiB | ||
| "medium" => ("medium", 500_000, 100L * 1024), // ~50 GiB | ||
| "large" => ("large", 2_000_000, 250L * 1024), // ~500 GiB |
There was a problem hiding this comment.
500gb is an overkill for testing
| internal sealed class Xorshift64 | ||
| { | ||
| private ulong _s; | ||
| public Xorshift64(ulong seed) { _s = seed == 0 ? 0xDEADBEEFCAFEBABEUL : seed; } |
There was a problem hiding this comment.
Is this seed fallback intentional?
| { | ||
| var records = new List<DocRecord> | ||
| { | ||
| new(@"C:\root\测试\测试_file.txt", "测试_file.txt", 512UL, DateTime.UtcNow), |
There was a problem hiding this comment.
Verify: Those characters may cause some issues with different environments and IDEs. Perhaps it's better to use a constant byte array and convert it to a string at runtime?
|
Instead of pulling the dependencies on gRPC, I think for such purpose we should build on top of WinRT so that the search service becomes an WinRT out-off-proc server. See the project |
|
@Josh65-2201
will need to fix |
There was a problem hiding this comment.
If this is just a benchmark results file, it should be deleted.
There was a problem hiding this comment.
Should be removed, https://github.com/files-community/Files/blob/main/src/Files.App/Package.appxmanifest is what used

Resolved / Related Issues
There is no
Ready to buildissue number for this yet — this PR is opened at a maintainer's request in Discord to start the design discussion around a faster, optional search backend. Happy to file/link a tracking issue if preferred.Closes #5845
Summary
Adds an opt-in indexed search backend alongside the existing
Windows.Storage.Searchpath. The current provider remains the default; nothing changes unless the user explicitly enables indexed search.files-search-service.exe) maintains an in-memory inverted + trigram filename index over the user's home directory, kept live by aReadDirectoryChangesWwatcher with process throttling so it stays out of the way.ISearchProviderabstraction (Files.SearchAbstraction), with two implementations:Files.IndexedSearch.Client(new) andFiles.LegacySearch(wraps today's behavior).SearchRouterselects the provider, caches service health, and falls back to legacy if the service is unavailable.FILES_SEARCH_PROVIDER=Indexed).Package.Debug.appxmanifestomits the SCM service declaration so VS can sideload without admin; the service is spawned as a child process instead.Benchmark on a ~1M-file corpus: first-result latency ~11 ms indexed vs ~2025 ms legacy. Pinned baseline checked in at
bench-results/baseline.json; reproduce withrun-bench.ps1. Design notes indocs/csharp-search-service.md.Steps used to test these changes
dotnet run -c Release --project tests/Files.Search.Correctness/Files.Search.Correctness.csproj→ 92/92 pass (tokenizer, scorer, index, persistence, corpus correctness).SearchRouterfalls back to legacy search without errors.Windows.Storage.Searchpath.