Skip to content

Optimizations for significantly faster downloads and cache hits#302

Open
DePasqualeOrg wants to merge 9 commits intohuggingface:mainfrom
DePasqualeOrg:optimizations
Open

Optimizations for significantly faster downloads and cache hits#302
DePasqualeOrg wants to merge 9 commits intohuggingface:mainfrom
DePasqualeOrg:optimizations

Conversation

@DePasqualeOrg
Copy link
Contributor

@DePasqualeOrg DePasqualeOrg commented Dec 26, 2025

This PR offers significant improvements in download and cache performance, and also brings the Swift Hub implementation closer to feature parity with the Python huggingface_hub library.

Changes

1. Skip HEAD requests for cached files

When downloading files that are already cached, we now skip the individual HEAD requests per file. The snapshot function fetches the current commit hash once via getRepoInfo, then passes it to each file download. If the local metadata shows the same commit hash, the file is returned immediately—no HEAD request needed to verify it's unchanged.

Python equivalent: file_download.py:1082-1095

2. Parallel file downloads

Files are now downloaded concurrently using a task group with a configurable number of concurrent downloads, matching the Python library's default of 8.

Python equivalent: _snapshot_download.py:449-455

3. Verify file integrity after download, skip re-hash on cache hit

LFS files (identified by SHA256 etags) are now verified after download. Previously, hash verification ran on every load in offline mode, adding ~200 ms+ for large files. Now we verify once at download time and trust the cache afterward.

Python equivalent: file_download.py:1394-1408

4. Size-weighted progress reporting

Progress is now weighted by file size instead of file count. This provides smoother, more accurate progress bars for downloads containing a mix of small config files and large model weights.

The getRepoInfo function fetches file sizes via the blobs=true API parameter (see hf_api.py:2617), and ProgressCoordinator uses these sizes as weights for each file's contribution to overall progress.

Benchmark Results

Tested with mlx-community/Qwen3-0.6B-Base-DQ5 (11 MB tokenizer.json).

Benchmark Before After Improvement
Cached file retrieval 782 ms 267 ms 2.9x faster
Offline mode cache hit 4.87 ms 0.14 ms 35x faster
Parallel downloads 1704 ms 742 ms 2.3x faster

Testing

  • Added unit tests for getRepoInfo, snapshot caching, and offline mode
  • Added HubBenchmarks.swift with reproducible performance tests. You can check out commit 49f7e1b to run the benchmarks before the changes in this PR, and then run them again with the latest commit in this PR to see the difference. These benchmarks can be deleted before merging or kept for testing future improvements.

@DePasqualeOrg
Copy link
Contributor Author

I've added the same optimizations to swift-huggingface in huggingface/swift-huggingface#21, which required porting some missing functionality from swift-transformers and huggingface_hub to swift-huggingface.

@DePasqualeOrg DePasqualeOrg force-pushed the optimizations branch 4 times, most recently from bc00570 to dd87e89 Compare December 26, 2025 22:17
@DePasqualeOrg DePasqualeOrg changed the title Optimize Hub download and cache performance Optimize download and cache performance Dec 27, 2025
@DePasqualeOrg DePasqualeOrg changed the title Optimize download and cache performance Optimizations for significantly faster downloads and cache hits Dec 27, 2025
@DePasqualeOrg
Copy link
Contributor Author

DePasqualeOrg commented Dec 27, 2025

After #304 is merged, I'll move the benchmark test to the separate Benchmark target that was added in that PR so that it doesn't run in CI.

@DePasqualeOrg DePasqualeOrg marked this pull request as draft January 5, 2026 12:45
@DePasqualeOrg DePasqualeOrg marked this pull request as ready for review January 5, 2026 13:54
Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks directionally ok. A couple of initial comments before examining the code in detail:

  • It's important to verify that a HEAD or GET request is performed on cached repos in exactly the same way the Python library does them (except in offline mode)
  • Parallel downloading is very much dependent on the system and network. Have you considered the impact on iOS and mobile?
  • Size-weighted progress reporting is a great idea.
  • For the cached metadata, do we still verify if it changed server-side?

Also, note that the end goal is to proceed with swift-huggingface (#297)

@DePasqualeOrg
Copy link
Contributor Author

DePasqualeOrg commented Feb 7, 2026

  • It's important to verify that a HEAD or GET request is performed on cached repos in exactly the same way the Python library does them (except in offline mode)

I took care to align with the Python library.

  • Parallel downloading is very much dependent on the system and network. Have you considered the impact on iOS and mobile?

We could keep the default concurrency limit of 8 on macOS and set a lower default on iOS if you think that makes sense. The limit is configurable, so callers (like an iOS app) can pass a lower value.

  • For the cached metadata, do we still verify if it changed server-side?

Yes, through the commit hash.

Also, note that the end goal is to proceed with swift-huggingface (#297)

I have implemented similar optimizations in huggingface/swift-huggingface#21.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants