feat: Persist hashes by honnix · Pull Request #119 · bazel-contrib/target-determinator

honnix · 2025-09-11T21:21:30Z

Claude's interpretation and implementation of #3 . This looks like a good foundation to start.

These were the prompts:

take a good look at #3, especially for the links; update the code to add a feature that can persist computed hash given a git commit sha; add another cli command that can take two persisted computed hash files, and identify the bazel diff

wrote a short for the persistencer and differ feature you just implemented in README.md

After a quick and naive test, it outputs the same list as using target-determinator CLI directly, for the diff between HEAD and HEAD~2

You can clearly tell the following is not written by me as a human.

✅ Implementation Complete

I've added a comprehensive hash persistence and comparison feature to the target-determinator project:

New Features Added:

Hash Persistence (pkg/hash_persistence.go):
- PersistHashes() - Saves computed target hashes to JSON files
- LoadPersistedHashes() - Loads persisted hash data from JSON files
- CompareHashFiles() - Compares two hash files and identifies differences
- Rich metadata including git commit SHA, timestamp, Bazel version, and workspace info
Hash Persister CLI (hash-persister/):
- Command: hash-persister --output
- Computes and persists target hashes for any git commit
- Supports all existing target-determinator flags (--targets, --bazel, etc.)
- Creates JSON files with comprehensive hash data
Hash Differ CLI (hash-differ/):
- Command: hash-differ
- Compares two persisted hash files to identify Bazel diffs
- Multiple output formats:
- --format targets: Lists affected target labels (compatible with target-determinator)
- --format summary: Human-readable summary with statistics
- --format json: Full structured JSON output
  - Efficient comparison without recomputing hashes

Key Benefits:

Performance: Compare commits without recomputing expensive target hashes
Persistence: Store hash computations for later analysis
Flexibility: Multiple output formats for different use cases
Compatibility: Integrates seamlessly with existing target-determinator workflow
Metadata: Rich information about when/how hashes were computed

Usage Examples:

Persist hashes for a commit

hash-persister --output commit1.json --targets "//..." abcd123

Persist hashes for another commit

hash-persister --output commit2.json --targets "//..." efgh456

Compare the commits to find affected targets

hash-differ --format targets commit1.json commit2.json

Get detailed comparison summary

hash-differ --format summary --verbose commit1.json commit2.json

Testing Results:

✅ Both binaries build successfully with Bazel✅ Hash persistence works correctly with real git commits✅ Hash comparison
correctly identifies differences between commits✅ Hash comparison correctly shows no differences for identical commits✅ All
output formats (targets, summary, json) work as expected✅ Integration with existing CLI flags and target patterns works

The implementation addresses the GitHub issue requirement for persisting computed hashes and comparing them to identify Bazel
diffs efficiently.

honnix · 2025-09-12T07:07:21Z

@sitaktif Does this look too crazy? If it is reasonable, I will continue down the road with manual touches.

sitaktif

Hey @honnix, this preliminary implementation looks pretty reasonable, thanks!

It's missing the integration with the main flow (target-determinator); things such as:

ensuring that target-determinator chooses the right file layout in a cache directory when loading an existing cache file. Indeed I am in favor or using a cache directory that is completely handled by TD, so the user gets the benefit for free, without specifying specific hash files to load.
Generally, how things are tied together
As a side note, ensuring that the --verbose option still works as expected (either write the full configuration when it's set, or just bypass caching)

One thing I'd like, also, is to create an interface for the cache. That will allow us to create different implementations (e.g. an S3 backed one).

Concerning the new hash-differ and hash-persister binaries - I'm not sure how I feel about these. There's value in providing a single binary for our users (the driver binary is mostly an example on how to use the Go library from my POV). They are simple but also compare more "brittle" references: the user compares files instead of comparing git hashes (there's more room for error). Happy to discuss!

honnix · 2025-09-15T09:23:29Z

@sitaktif Thank you for the feedback! I definitely see the points you made. From my perspective, I prefer leaving all the caching handling out of TD so users has the most flexibility to decide when, where and how to cache, and to decide how to map a git commit to a cached file (a simple yaml file, a database, etc.). A few reasons I can think of:

I think (I could be very wrong), most users of TD don't extend TD by writing extension code, so offering an interface and related mechanism to be able to allow users to apply different caching strategies in practice may not be so useful, and the cost of offering and maintaining those in TD is not negligible
It keeps TD still simple doing one thing and doing it very well; once opening the door offering cache capability, FRs for different backends will fly in, as well as for different caching policies/strategies, and to me they are very much out of the interest of TD
Small binaries doing one thing and communicating through pipe or file system, is a well proved design pattern and common practice in **nix world

These are my thoughts, and me too, happy to discuss.

honnix added 3 commits September 11, 2025 22:45

feat: Persist hashes

0d3bca8

Fix format

4e2950b

Generate doc

9c1ddc2

sitaktif reviewed Sep 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: Persist hashes#119

feat: Persist hashes#119
honnix wants to merge 3 commits intobazel-contrib:mainfrom
honnix:serialisation

honnix commented Sep 11, 2025 •

edited

Loading

Uh oh!

honnix commented Sep 12, 2025

Uh oh!

sitaktif left a comment

Uh oh!

honnix commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

honnix commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Persist hashes for a commit

Persist hashes for another commit

Compare the commits to find affected targets

Get detailed comparison summary

Uh oh!

honnix commented Sep 12, 2025

Uh oh!

sitaktif left a comment

Choose a reason for hiding this comment

Uh oh!

honnix commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

honnix commented Sep 11, 2025 •

edited

Loading