Skip to content

Comments

feat(core): batch-safe hashing with output fingerprinting#34446

Draft
FrozenPandaz wants to merge 10 commits intomasterfrom
fix-batch-hash
Draft

feat(core): batch-safe hashing with output fingerprinting#34446
FrozenPandaz wants to merge 10 commits intomasterfrom
fix-batch-hash

Conversation

@FrozenPandaz
Copy link
Collaborator

@FrozenPandaz FrozenPandaz commented Feb 13, 2026

Current Behavior

In batch mode, all task hashes are computed upfront before the batch executor runs any tasks. Tasks with dependentTasksOutputFiles get hashed using whatever dependency outputs happen to be on disk from a previous run. This leads to:

  • False cache hits: If a dependency's sources changed but its old outputs are still on disk, the dependent task's hash matches a stale cache entry and wrong results are served.
  • False cache misses: On cold runs with no outputs on disk, the hash is computed without dependency output content and never matches any stored cache entry.

Non-batch mode doesn't have this problem because it uses lazy hashing — tasks with depsOutputs are only hashed after their dependencies complete and fresh outputs exist on disk.

Additionally, the shouldCopyOutputsFromCache and recordOutputsHash methods only worked when the daemon was running. Without the daemon, shouldCopyOutputsFromCache always returned true (always re-copy) and recordOutputsHash was a no-op.

Expected Behavior

Batch mode correctly validates whether dependency outputs on disk match the dependency's current hash before trusting any cache entries. When outputs are stale, cache reads are skipped and tasks are re-hashed after the batch completes with fresh outputs. This validation works with or without the daemon.

How it works

  1. Hash all tasks in the batch (existing behavior)
  2. Validate: For each task with depsOutputs whose dependency is also in the batch, check if the dependency's outputs on disk were produced by its current hash. If not, mark the task as stale.
  3. Skip cache reads for stale tasks
  4. Run the batch
  5. Clear hashes on stale tasks, call hashTasks() again — outputs are now fresh
  6. postRunSteps stores cache entries under the correct hash

Output Fingerprinting (daemon-free fallback)

Previously, output freshness validation only worked with the daemon's in-memory tracking. This PR adds a persistent alternative:

  • OutputFingerprints (Rust/napi): SQLite-backed table that maps task_hash → fingerprint where fingerprint is a deterministic hash of all output files
  • hashTaskOutput (Rust/napi): Exposed existing Rust output hashing function to TypeScript — uses rayon for parallel file hashing
  • Unified outputsHashesMatch: Uses the daemon when available, falls back to comparing the current on-disk fingerprint against the stored DB fingerprint

recordOutputsHash now always persists fingerprints to the DB (in addition to notifying the daemon when available), so subsequent runs can validate outputs even without the daemon.

Run-by-run behavior

Run Dep outputs on disk? Outputs match dep hash? Cache result
1 (cold) No No → skip cache Miss → runs → stores correct hash
2 (same sources) Yes, from run 1 Yes → trust hash Hit
N (dep sources changed) Yes, but stale No → skip cache Miss → runs → stores correct hash

Files changed

File Change
packages/nx/src/native/tasks/output_fingerprints.rs New — DB-backed OutputFingerprints struct
packages/nx/src/native/tasks/hashers/hash_task_output.rs Expose hashTaskOutput via napi
packages/nx/src/native/tasks/task_hasher.rs Import rename
packages/nx/src/native/tasks/mod.rs Register new module
packages/nx/src/native/db/initialize.rs Register new table schema
packages/nx/src/tasks-runner/task-orchestrator.ts Batch validation, unified outputsHashesMatch, recordOutputsHash with DB persistence

Related Issue(s)

Related to #30949

@nx-cloud
Copy link
Contributor

nx-cloud bot commented Feb 13, 2026

View your CI Pipeline Execution ↗ for commit fd6015c

Command Status Duration Result
nx affected --targets=lint,test,test-kt,build,e... ❌ Failed 47m 54s View ↗
nx run-many -t check-imports check-lock-files c... ✅ Succeeded 3m 7s View ↗
nx-cloud record -- nx-cloud conformance:check ✅ Succeeded 8s View ↗
nx-cloud record -- nx format:check ✅ Succeeded 2s View ↗
nx-cloud record -- nx sync:check ✅ Succeeded <1s View ↗

☁️ Nx Cloud last updated this comment at 2026-02-21 01:50:05 UTC

@FrozenPandaz FrozenPandaz force-pushed the fix-batch-hash branch 3 times, most recently from cf36e18 to 71d9c98 Compare February 13, 2026 16:52
@netlify
Copy link

netlify bot commented Feb 13, 2026

Deploy Preview for nx-docs ready!

Name Link
🔨 Latest commit fd6015c
🔍 Latest deploy log https://app.netlify.com/projects/nx-docs/deploys/699902ea6ee40c00084eac1e
😎 Deploy Preview https://deploy-preview-34446--nx-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Feb 13, 2026

Deploy Preview for nx-dev ready!

Name Link
🔨 Latest commit fd6015c
🔍 Latest deploy log https://app.netlify.com/projects/nx-dev/deploys/699902ead7ce1e0007df778f
😎 Deploy Preview https://deploy-preview-34446--nx-dev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@FrozenPandaz FrozenPandaz changed the title fix(core): validate batch task hashes against stale dependency outputs feat(core): batch-safe hashing with output fingerprinting Feb 13, 2026
nx-cloud[bot]

This comment was marked as outdated.

@FrozenPandaz FrozenPandaz force-pushed the fix-batch-hash branch 2 times, most recently from ef9a35e to 92f349b Compare February 20, 2026 16:11
nx-cloud[bot]

This comment was marked as outdated.

@FrozenPandaz FrozenPandaz force-pushed the fix-batch-hash branch 2 times, most recently from 940564b to b74c720 Compare February 20, 2026 19:57
nx-cloud[bot]

This comment was marked as outdated.

When tasks with dependentTasksOutputFiles are co-batched with their
dependencies, hashes are computed using outputs from a previous run
that may be stale. This adds a validation step that checks whether
dependency outputs on disk match the dependency's current hash, skips
cache reads for untrustworthy hashes, and re-hashes after the batch
completes with fresh outputs.
Add OutputFingerprints Rust struct backed by SQLite for persisting
output file checksums. Expose hashTaskOutput via napi so TypeScript
can fingerprint outputs without the daemon. Unify outputsHashesMatch
to use daemon when available, falling back to DB fingerprints.
With output fingerprinting, outputs that already match the cache are
left in place instead of being re-copied. Update e2e assertions to
expect "existing outputs match the cache, left as is" instead of
"local cache" when daemon is disabled or outputs are untouched.
Add null checks before calling hashTaskOutput to prevent
"Failed to convert JavaScript value Undefined into rust type String"
when tasks have no outputs defined (e.g. maven tasks).
Also update symlink cache test assertion.
With output fingerprinting enabled for all cache operations (not just
batch mode), tasks whose outputs are already on disk now correctly
report "existing outputs match the cache" instead of "local cache".

Updated e2e assertions across cache, run, ng-add, and nx-init-angular
tests to expect the new status when outputs haven't been deleted.
Wrap identifyTasksWithStaleDepsOutputs and getInputs in try-catch so
that targets without proper input configuration (e.g. inferred maven
targets) don't crash the entire batch execution.
hashTasks filters out tasks that already have a hash. The previous code
cleared hashes on the result copies (created by runBatch via spread) but
called hashTasks on batch.taskGraph which holds the originals — still
with their hashes. This caused hashTasks to skip them entirely, leaving
the copies with undefined hashes that crashed napi when passed to
cache.put.

Clear hashes on the originals so hashTasks picks them up, then sync the
fresh hashes back to the result copies.
The testCompile target hash was missing src/test/java/**/*.java because
CacheConfig used the wrong parameter name for the compiler plugin's
test source roots. Also removes an unnecessary fallback in
MavenExpressionResolver that masked the issue.
Copy link
Contributor

@nx-cloud nx-cloud bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

At least one additional CI pipeline execution has run since the conclusion below was written and it may no longer be applicable.

Nx Cloud has identified a possible root cause for your failed CI:

The e2e-gradle test failure appears to be an environment_state issue rather than a code defect. The failing test is in a project not modified by our PR (e2e-gradle is not in touched_projects), the error occurs in an unchanged test file, and the similar-task-failure-detector confirms this error pattern doesn't exist on master. The process execution failure when running nx show projects has no logical connection to our batch hashing and output fingerprinting changes in task-orchestrator.ts.

No code changes were suggested for this issue.

Trigger a rerun:

Rerun CI

Nx Cloud View detailed reasoning on Nx Cloud ↗


🎓 Learn more about Self-Healing CI on nx.dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant