Skip to content

Conversation

@juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Oct 31, 2025

When using a nondefault data directory location (with ANNIF_DATADIR environment variable) and running annif upload, the zipped projects and vocabs data files end up in a wrong path in the destination repository; the full local data directory gets created to the repo. For example, when using a temp directory, a project datafile with a path like this appears in the repo: tmp/tmp3ttl0hbh/data/projects/tfidf-fi.zip

This is fixed by getting the path to be used in the repo by reading the local datadir path from the right and using two parts of it (the project-id directory and the directory containing it, either projects or vocabs).

There was a unit test for this, but it was asserting a wrong path to be created in the repo.

I noticed this when working with #907.

@juhoinkinen juhoinkinen force-pushed the fix-hf-upload-destination-path branch from 0372c62 to 5bdb924 Compare November 3, 2025 09:10
@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 3, 2025

@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.63%. Comparing base (371d1eb) to head (5bdb924).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #908   +/-   ##
=======================================
  Coverage   99.63%   99.63%           
=======================================
  Files         103      103           
  Lines        8236     8236           
=======================================
  Hits         8206     8206           
  Misses         30       30           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@juhoinkinen juhoinkinen marked this pull request as ready for review November 3, 2025 15:43
@juhoinkinen juhoinkinen added this to the 1.4.1 milestone Nov 4, 2025
@juhoinkinen juhoinkinen requested a review from Copilot November 4, 2025 12:30
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the path construction logic for uploading project and vocabulary data to Hugging Face Hub to handle nested base data directories correctly.

  • Changed path extraction from split(os.path.sep, 1)[1] to rsplit(os.path.sep, 2)[1:3] to reliably extract the last two path components (typename/identifier)
  • Updated test assertions to match the new expected paths without the "data/" prefix

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
annif/hfh_util.py Modified _prepare_datadir_commit to use rsplit for extracting the last two path components, ensuring correct paths regardless of base datadir depth
tests/test_cli.py Updated test expectations to verify paths are "vocabs/dummy.zip" and "projects/dummy-fi.zip" instead of "data/vocabs/dummy.zip" and "data/projects/dummy-fi.zip"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@juhoinkinen juhoinkinen merged commit a923a56 into main Nov 4, 2025
40 of 48 checks passed
@juhoinkinen
Copy link
Member Author

Now I noticed this was only a partial fix: the member paths inside the zip still include the full datadir part:

unzip -l ~/Downloads/tfidf-fi.zip 
Archive:  /home/jmminkin/Downloads/tfidf-fi.zip
Archived by Annif 1.5.0.dev0
  Length      Date    Time    Name
---------  ---------- -----   ----
   406760  2025-11-11 13:45   tmp/data/projects/tfidf-fi/vectorizer
   159508  2025-11-11 13:45   tmp/data/projects/tfidf-fi/tfidf-matrix.npz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants