Skip to content

Bulk metadata processing script using json-schema and strict author ID matching#7517

Merged
nschneid merged 24 commits intomasterfrom
json-schema
Mar 3, 2026
Merged

Bulk metadata processing script using json-schema and strict author ID matching#7517
nschneid merged 24 commits intomasterfrom
json-schema

Conversation

@nschneid
Copy link
Collaborator

@nschneid nschneid commented Feb 14, 2026

Branches off of @weissenh's changes in #7395. The schema now allows for a deleted_authors entry for more explicit checking of the mapping between old and new authors.

#7642 is the accompanying front-end change (dialog stores more explicit info in JSON).

closes #7274
closes #6327

@nschneid nschneid changed the base branch from master to update-script-process-bulk-metadata February 14, 2026 14:44
@github-actions
Copy link

github-actions bot commented Feb 14, 2026

Build successful. Some useful links:

This preview will be removed when the branch is merged.

@nschneid
Copy link
Collaborator Author

I merged master into here so I could test updates against the current database. The only actual changes are to requirements.txt (adding jsonschema) and process_bulk_metadata.py.

@nschneid nschneid changed the base branch from update-script-process-bulk-metadata to master March 2, 2026 19:18
@nschneid nschneid changed the title Change JSON validation to use json-schema Bulk metadata processing script using json-schema and strict author ID matching Mar 2, 2026
@nschneid
Copy link
Collaborator Author

nschneid commented Mar 2, 2026

Renamed and moved to a subdirectory: bin/correct/bulk_process_metadata.py

The script has been working fine for me in --dry-run mode (it does commits but I create the PRs manually). I propose we merge to master and then add other data correction scripts that use the new library.

@nschneid nschneid marked this pull request as ready for review March 2, 2026 19:33
Copy link
Member

@mbollmann mbollmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked the parts that interact with the library and those LGTM. Importantly I did not check the author matching logic from the issue JSON, but probably we shouldn’t aim to do that in a code review anyway, but by writing test cases first and foremost. Maybe we can add those soon?

@nschneid
Copy link
Collaborator Author

nschneid commented Mar 3, 2026

Merging this version. Agreed that tests would be great to have!

@nschneid nschneid merged commit b84019c into master Mar 3, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rewrite process_bulk_metadata.py to use new library Handle reordering of authors in process_bulk_metadata.py

3 participants