Optimizations for significantly faster downloads and cache hits#302
Optimizations for significantly faster downloads and cache hits#302DePasqualeOrg wants to merge 9 commits intohuggingface:mainfrom
Conversation
63ab365 to
d877136
Compare
|
I've added the same optimizations to swift-huggingface in huggingface/swift-huggingface#21, which required porting some missing functionality from swift-transformers and huggingface_hub to swift-huggingface. |
bc00570 to
dd87e89
Compare
…wnload progress by file size
dd87e89 to
2cabd23
Compare
|
After #304 is merged, I'll move the benchmark test to the separate Benchmark target that was added in that PR so that it doesn't run in CI. |
pcuenca
left a comment
There was a problem hiding this comment.
Looks directionally ok. A couple of initial comments before examining the code in detail:
- It's important to verify that a
HEADorGETrequest is performed on cached repos in exactly the same way the Python library does them (except in offline mode) - Parallel downloading is very much dependent on the system and network. Have you considered the impact on iOS and mobile?
- Size-weighted progress reporting is a great idea.
- For the cached metadata, do we still verify if it changed server-side?
Also, note that the end goal is to proceed with swift-huggingface (#297)
I took care to align with the Python library.
We could keep the default concurrency limit of 8 on macOS and set a lower default on iOS if you think that makes sense. The limit is configurable, so callers (like an iOS app) can pass a lower value.
Yes, through the commit hash.
I have implemented similar optimizations in huggingface/swift-huggingface#21. |
This PR offers significant improvements in download and cache performance, and also brings the Swift Hub implementation closer to feature parity with the Python huggingface_hub library.
Changes
1. Skip HEAD requests for cached files
When downloading files that are already cached, we now skip the individual HEAD requests per file. The
snapshotfunction fetches the current commit hash once viagetRepoInfo, then passes it to each file download. If the local metadata shows the same commit hash, the file is returned immediately—no HEAD request needed to verify it's unchanged.Python equivalent:
file_download.py:1082-10952. Parallel file downloads
Files are now downloaded concurrently using a task group with a configurable number of concurrent downloads, matching the Python library's default of 8.
Python equivalent:
_snapshot_download.py:449-4553. Verify file integrity after download, skip re-hash on cache hit
LFS files (identified by SHA256 etags) are now verified after download. Previously, hash verification ran on every load in offline mode, adding ~200 ms+ for large files. Now we verify once at download time and trust the cache afterward.
Python equivalent:
file_download.py:1394-14084. Size-weighted progress reporting
Progress is now weighted by file size instead of file count. This provides smoother, more accurate progress bars for downloads containing a mix of small config files and large model weights.
The
getRepoInfofunction fetches file sizes via theblobs=trueAPI parameter (seehf_api.py:2617), andProgressCoordinatoruses these sizes as weights for each file's contribution to overall progress.Benchmark Results
Tested with
mlx-community/Qwen3-0.6B-Base-DQ5(11 MB tokenizer.json).Testing
getRepoInfo, snapshot caching, and offline modeHubBenchmarks.swiftwith reproducible performance tests. You can check out commit 49f7e1b to run the benchmarks before the changes in this PR, and then run them again with the latest commit in this PR to see the difference. These benchmarks can be deleted before merging or kept for testing future improvements.