Skip to content

Feature: Optional indexed search backend (C# search service)#18551

Open
Wingingbump wants to merge 10 commits into
files-community:mainfrom
Wingingbump:feature/search-service-upstream
Open

Feature: Optional indexed search backend (C# search service)#18551
Wingingbump wants to merge 10 commits into
files-community:mainfrom
Wingingbump:feature/search-service-upstream

Conversation

@Wingingbump

@Wingingbump Wingingbump commented Jun 5, 2026

Copy link
Copy Markdown

Resolved / Related Issues

There is no Ready to build issue number for this yet — this PR is opened at a maintainer's request in Discord to start the design discussion around a faster, optional search backend. Happy to file/link a tracking issue if preferred.

Closes #5845

Summary

Adds an opt-in indexed search backend alongside the existing Windows.Storage.Search path. The current provider remains the default; nothing changes unless the user explicitly enables indexed search.

  • A standalone C# Windows Service (files-search-service.exe) maintains an in-memory inverted + trigram filename index over the user's home directory, kept live by a ReadDirectoryChangesW watcher with process throttling so it stays out of the way.
  • Files.App talks to it over gRPC through a new ISearchProvider abstraction (Files.SearchAbstraction), with two implementations: Files.IndexedSearch.Client (new) and Files.LegacySearch (wraps today's behavior).
  • A SearchRouter selects the provider, caches service health, and falls back to legacy if the service is unavailable.
  • Opt-in via Settings → Advanced → "Use indexed search" (or FILES_SEARCH_PROVIDER=Indexed).
  • In Debug, a Package.Debug.appxmanifest omits the SCM service declaration so VS can sideload without admin; the service is spawned as a child process instead.

Benchmark on a ~1M-file corpus: first-result latency ~11 ms indexed vs ~2025 ms legacy. Pinned baseline checked in at bench-results/baseline.json; reproduce with run-bench.ps1. Design notes in docs/csharp-search-service.md.

Steps used to test these changes

  1. dotnet run -c Release --project tests/Files.Search.Correctness/Files.Search.Correctness.csproj → 92/92 pass (tokenizer, scorer, index, persistence, corpus correctness).
  2. Built and launched Files.App; enabled Settings → Advanced → Use indexed search.
  3. Searched in a large directory and confirmed results stream in with the indexed provider; verified ranking/order against the legacy provider for the same queries.
  4. Stopped the search service and confirmed SearchRouter falls back to legacy search without errors.
  5. Toggled the setting off and confirmed behavior returns to the stock Windows.Storage.Search path.

Adds an opt-in fast-search backend behind a new ISearchProvider seam.
Default behavior is byte-identical to upstream; the new path is taken
only when FILES_SEARCH_PROVIDER=Indexed is set. Seeking maintainer
direction (see docs/proposal.md) before any upstream PRs.

Components:
- src/search-service/   Rust gRPC service: Tantivy filename index,
                        FindFirstFileExW + rayon enumerator, notify
                        watcher, PROCESS_MODE_BACKGROUND_BEGIN +
                        battery/fullscreen/load throttling. 12 tests.
- src/Files.SearchAbstraction/  ISearchProvider interface + DTOs.
- src/Files.LegacySearch/       Wraps Windows.Storage.Search/AQS.
- src/Files.IndexedSearch.Client/  gRPC client (TCP for v0).
- src/Files.App/.../SearchRouter.cs  Drop-in for FolderSearch; routes
                        to indexed when in scope, falls back to legacy
                        on glob/AQS/library/service-down.
- tests/Files.Search.Bench/     200-query harness with JSON output.
- tests/corpora/                Deterministic corpus generator.

Bench (5k smoke corpus): indexed is ~595x faster than legacy fallback
on substring queries. Big O analysis projects the gap to widen at
larger corpora (legacy is O(N) per query when the path isn't in
Windows Search Indexer's catalog; indexed is O(log N) always). See
docs/decisions/0003-bench-strategy-theoretical.md for the projection
methodology.

Build env: requires VS 2026 (v145 toolset for .NET 10).
Files.App.Launcher.vcxproj bumped to stdcpp20 + FilesLauncher.cpp uses
::towupper for C++20 compatibility (see project_build_env memory).
Replaces the Rust PoC sidecar with a pure C# Windows Service
(Files.SearchService) that ships the same gRPC wire format and
ISearchProvider abstraction, removing the Rust toolchain from the
project's build matrix.

Search side:
- In-memory inverted + trigram filename index with atomic
  reference-swap publish; two-tier scoring + score-then-truncate.
- USN journal enumeration when running as LocalSystem; DirectoryInfo
  walk fallback for dev mode.
- ChangeWatcher + 250ms-debounced EventBatcher for live updates;
  overflow triggers a full rebuild without losing events.
- ProcessThrottle: background priority + battery/fullscreen/CPU polling.
- IndexPersistence: binary format with magic+version, warm-start
  reconcile against disk on service launch.
- Kestrel gRPC over named pipe in packaged/SCM mode, TCP loopback in
  dev. Named pipe DACL grants AuthenticatedUsers ReadWrite | Synchronize
  — Synchronize is required for NamedPipeClientStream's async connect
  across the LocalSystem/user-session boundary, was the source of the
  prior UnauthorizedAccessException.

App side:
- SearchRouter replaces FolderSearch as the call-site seam; routes glob/
  AQS/Home/Library to legacy, everything else to indexed when enabled.
- SearchServiceManager.EnsureRunning bridges packaged (ServiceController)
  and dev (HKCU\Run + direct launch) startup paths.
- UseIndexedSearch toggle in Settings → Advanced, stored via
  IGeneralSettingsService. Env var FILES_SEARCH_PROVIDER kept as a dev
  override.
- desktop6:Service declaration in Package.appxmanifest with
  StartAccount=localSystem. Debug manifest omits it so VS can sideload
  without admin.
- WindowsAppSdkDeploymentManagerInitialize=false to skip the
  DeploymentManager auto-init that was throwing REGDB_E_CLASSNOTREG on
  packaged launches; the Main+Singleton packages already ship with the
  framework dependency, so the auto-init was redundant.

Tests:
- tests/Files.Search.Correctness (92 tests pass): FileIndex, Tokenizer,
  Scorer, Persistence, CorpusCorrectness.
- tests/Files.Search.Probe: console smoke harness with bench/query/check
  subcommands.
- tests/Files.Search.Bench wired with naive-scan, legacy (AQS), and
  indexed providers; same .proto + same scope adapter for fair compare.

Bench: bench-results/baseline.json pinned (50k 'small' corpus, indexed
TTFR p50=11ms, p99=88ms, total p99=210ms). Legacy AQS measured at 5k
(2025ms TTFR) — full-scale legacy run deferred per ADR 0003 since it's
O(N) per query off-Indexer and produces no gate-relevant information.

Docs: rewrote CLAUDE.md, search-roadmap.md, README; added
csharp-search-service.md (architecture + file map). Deleted
proposal.md, improvements.md, and ADR 0002 (Rust-specific and
superseded). Removed src/search-service/ — the Rust PoC stays in
branch history via commit 534d784.

Validated end-to-end in packaged/SCM mode: Get-Service shows
FilesSearchService running as LocalSystem; cross-context named-pipe
connect from a non-elevated user returns 'connected OK'. Files.App
itself has a pre-existing packaged-launch crash unrelated to search
(silent exit before managed code) that needs a separate pass before
release shipping; the search-service infrastructure is independently
proven.
Files.App exits silently before managed code runs on packaged install.
Root cause unknown — captured timeline, ruled-out hypotheses, and
concrete next debug steps so this can be picked up cold.

Also notes two build-side issues:
- First clean-tree MSBuild pass fails manifest validation due to
  Condition="Exists(...)" on the service-staging Content glob;
  succeeds on second pass once SearchService output exists on disk.
- v143 platform toolset on Files.App.Launcher.vcxproj not present
  locally on VS 2026 machines.
The AppxManifest declares an OutOfProcessServer at
Files.App.Server\Files.App.Server.exe, but the csproj was only staging
the .winmd metadata at the top level — the exe and its deps were never
copied into the package's Files.App.Server\ subdir. Adds a Content
Include block mirroring the SearchService staging pattern.

Also corrects EntryPoint="windows.FullTrustApplication" (lowercase 'w')
to "Windows.FullTrustApplication" on the desktop6:Service extension.
Microsoft docs specify capital W for the service entrypoint string.

Neither fix resolves the Win10 19045 packaged-activation failure (see
docs/packaged-build-debug-notes.md), but both are real package
correctness bugs.
…on scripts

Comment fixes:
- SearchServiceManager.LaunchIfNotRunning now describes the actual
  TCP-loopback collision pattern instead of mentioning named pipes.
  Dev mode uses FILES_SEARCH_SERVICE_URL with TCP loopback, not pipes.
- SearchRouter._serviceAvailable documents that the racing reads/writes
  are intentionally unsynchronized (the worst case is two concurrent
  first-searches each issuing an idempotent health probe).

scripts/dev-cycle.ps1: one-shot stop-service + uninstall + build +
install + activate cycle. Handles the manifest-validation retry on a
clean tree. Switches platform via -Platform x64|x86|arm64.

scripts/debug-activation.ps1: bundles the kernel-process ETW trace,
AppXDeploymentServer / AppModel-Runtime / TWinUI / Application-log
filters, and WER state inspection we needed for the Win10 19045
packaged-launch diagnosis. Writes a per-run output directory with one
file per source plus a summary.
CLAUDE.md and the local debug/dev-cycle PowerShell helpers were used
during development but aren't useful to upstream maintainers. README
loses the now-dangling CLAUDE.md link.
Reverts five stray local-dev artifacts on top of upstream files-community/Files
without touching any search-service wiring or DeploymentManager bug-fix.

- Package.appxmanifest: Publisher CN=Tommy -> CN=Files
- Package.appxmanifest: strip UTF-8 BOM
- Files.App.csproj: AppxBundlePlatforms restored to x86|x64|arm64
- Files.App.csproj: remove AppxPackageSigningEnabled + local cert thumbprint
- Files.App.csproj: restore 3-line MSBuild element formatting
Remove fork README, revert Files.App.Launcher build-env workarounds
(towupper / stdcpp20), and drop the .claude/ gitignore entry so the
diff against upstream contains only the search-service feature.
@Wingingbump Wingingbump marked this pull request as ready for review June 5, 2026 20:35
@yair100 yair100 added the ready for review Pull requests that are ready for review label Jun 7, 2026
@Josh65-2201

Josh65-2201 commented Jun 7, 2026

Copy link
Copy Markdown
Member

How do you get the index service installed? Enabling Files settings > Advanced > Use index search engine doesn't prompt for admin to install one or create a process in Task manager.

Also fails to do a search
image

@@ -0,0 +1,30 @@
syntax = "proto3";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using Google's Protocol Buffers in gRPC have any benefit over JSON? IIRC, protobuf is not a requirement for gRPC, and I'd insist on not adding more dependencies.

Need a decision on this @yair100

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// All structures match the Windows SDK definitions for USN_RECORD_V2
/// and MFT_ENUM_DATA_V0 used by FSCTL_ENUM_USN_DATA.
/// </summary>
internal static partial class NativeMethods

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Files.App.CsWin32 should be used instead?

Comment thread tests/corpora/Program.cs
{
"small" => ("small", 50_000, 40L * 1024), // ~2 GiB
"medium" => ("medium", 500_000, 100L * 1024), // ~50 GiB
"large" => ("large", 2_000_000, 250L * 1024), // ~500 GiB

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

500gb is an overkill for testing

Comment thread tests/corpora/Program.cs
internal sealed class Xorshift64
{
private ulong _s;
public Xorshift64(ulong seed) { _s = seed == 0 ? 0xDEADBEEFCAFEBABEUL : seed; }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this seed fallback intentional?

{
var records = new List<DocRecord>
{
new(@"C:\root\测试\测试_file.txt", "测试_file.txt", 512UL, DateTime.UtcNow),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify: Those characters may cause some issues with different environments and IDEs. Perhaps it's better to use a constant byte array and convert it to a string at runtime?

@hez2010

hez2010 commented Jun 7, 2026

Copy link
Copy Markdown
Member

Instead of pulling the dependencies on gRPC, I think for such purpose we should build on top of WinRT so that the search service becomes an WinRT out-off-proc server. See the project Files.App.Server.

@Wingingbump

Copy link
Copy Markdown
Author

@Josh65-2201
The service isn't starting because its binary doesn't get deployed on a clean build — Files.App
copies files-search-service.exe via a content glob that's evaluated before the Files.SearchService project
builds, so the first build leaves the SearchService\ folder empty.

  1. Build the Files.SearchService project first (or just rebuild the solution once more).
  2. Confirm the exe landed here:
    ...\Files.App\bin\<platform>\Debug\net10.0-windows10.0.26100.0\AppX\SearchService\files-search-service.exe
  3. Relaunch Files — it spawns the service itself as a child process over local TCP (no admin needed), and search
    should work.

will need to fix

@Josh65-2201 Josh65-2201 Jun 7, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just a benchmark results file, it should be deleted.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Josh65-2201 Josh65-2201 added changes requested Changes are needed for this pull request and removed ready for review Pull requests that are ready for review labels Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changes requested Changes are needed for this pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Everything search integration

5 participants