Bulk metadata processing script using json-schema and strict author ID matching#7517
Bulk metadata processing script using json-schema and strict author ID matching#7517
Conversation
|
Build successful. Some useful links:
This preview will be removed when the branch is merged. |
|
I merged master into here so I could test updates against the current database. The only actual changes are to requirements.txt (adding jsonschema) and process_bulk_metadata.py. |
- argparse to docopt - XML validation for abstract and title input - improved branch switching (stash, index computation) - simplified logic merge authors - changed logic match authors (2-step) - more input validation on JSON data Still needs testing
….yaml changes if running the script consecutively)
|
Renamed and moved to a subdirectory: bin/correct/bulk_process_metadata.py The script has been working fine for me in |
mbollmann
left a comment
There was a problem hiding this comment.
I re-checked the parts that interact with the library and those LGTM. Importantly I did not check the author matching logic from the issue JSON, but probably we shouldn’t aim to do that in a code review anyway, but by writing test cases first and foremost. Maybe we can add those soon?
|
Merging this version. Agreed that tests would be great to have! |
Branches off of @weissenh's changes in #7395. The schema now allows for a
deleted_authorsentry for more explicit checking of the mapping between old and new authors.#7642 is the accompanying front-end change (dialog stores more explicit info in JSON).
closes #7274
closes #6327