-
Notifications
You must be signed in to change notification settings - Fork 44
Fix zip destination path in Hugging Face repo when using custom data directory location #908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0372c62 to
5bdb924
Compare
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #908 +/- ##
=======================================
Coverage 99.63% 99.63%
=======================================
Files 103 103
Lines 8236 8236
=======================================
Hits 8206 8206
Misses 30 30 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the path construction logic for uploading project and vocabulary data to Hugging Face Hub to handle nested base data directories correctly.
- Changed path extraction from
split(os.path.sep, 1)[1]torsplit(os.path.sep, 2)[1:3]to reliably extract the last two path components (typename/identifier) - Updated test assertions to match the new expected paths without the "data/" prefix
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| annif/hfh_util.py | Modified _prepare_datadir_commit to use rsplit for extracting the last two path components, ensuring correct paths regardless of base datadir depth |
| tests/test_cli.py | Updated test expectations to verify paths are "vocabs/dummy.zip" and "projects/dummy-fi.zip" instead of "data/vocabs/dummy.zip" and "data/projects/dummy-fi.zip" |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Now I noticed this was only a partial fix: the member paths inside the zip still include the full datadir part: |



When using a nondefault data directory location (with
ANNIF_DATADIRenvironment variable) and runningannif upload, the zipped projects and vocabs data files end up in a wrong path in the destination repository; the full local data directory gets created to the repo. For example, when using a temp directory, a project datafile with a path like this appears in the repo:tmp/tmp3ttl0hbh/data/projects/tfidf-fi.zipThis is fixed by getting the path to be used in the repo by reading the local datadir path from the right and using two parts of it (the project-id directory and the directory containing it, either
projectsorvocabs).There was a unit test for this, but it was asserting a wrong path to be created in the repo.
I noticed this when working with #907.