Fix(lpips): load ImageNet backbone weights for pretrained models by koreaygj · Pull Request #4557 · tracel-ai/burn

koreaygj · 2026-02-22T06:47:35Z

Summary

Add TAR format support to burn-store for legacy PyTorch models (AlexNet, SqueezeNet)
Fix LPIPS to load ImageNet backbone weights separately from LPIPS linear weights
Fix key remapping to match PyTorch's original model structure

Checklist

Confirmed that cargo run-checks command has been executed.
Made sure the book is up to date with changes in this PR.

Related Issues/PRs

Fixes weight loading issues introduced in Implements: LPIPS matrics for Image quality #4403
Related to Image quality metrics #4312 (Image quality metrics)

Changes

LPIPS pretrained weights were not loading correctly. The original implementation only loaded LPIPS linear layer weights but not the ImageNet backbone weights (VGG16/AlexNet/SqueezeNet). Additionally, AlexNet and SqueezeNet backbone weights are stored in TAR format which was not supported.

Solution

burn-store: Added TAR format support for legacy PyTorch models (pre-1.6)
- Implemented TarSource in lazy_data.rs
- Added TAR file detection and loading in reader.rs
burn-train/lpips: Fixed pretrained weights loading
- Now loads both ImageNet backbone weights and LPIPS linear layer weights separately
- Fixed key remapping to match PyTorch's original model structure (features.X instead of net.sliceX)
- Added backbone weight URLs from PyTorch official repository

Testing

# LPIPS pretrained tests (all pass)
cargo test -p burn-train --lib --features vision test_lpips_pretrained

# Results:
# - VGG: 0.4112574 ✅
# - AlexNet: 0.37245688 ✅
# - SqueezeNet: 0.1753163 ✅

# burn-store tests
cargo test -p burn-store --features pytorch

Support loading PyTorch models saved in TAR format (pre-1.6), such as AlexNet and SqueezeNet from torchvision.

- not just use linear weights, add backbone weights

koreaygj · 2026-02-22T06:49:01Z

Currently, burn-train --features vision tests are not included in CI, so LPIPS pretrained test failures were not caught. Adding vision feature tests to CI should be considered in a follow-up PR.

codecov · 2026-02-22T07:41:25Z

Codecov Report

❌ Patch coverage is 77.46914% with 146 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.04%. Comparing base (5923b1e) to head (f931b5d).
⚠️ Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/burn-store/src/pytorch/pickle_reader.rs	60.39%	80 Missing ⚠️
crates/burn-store/src/pytorch/lazy_data.rs	59.66%	48 Missing ⚠️
crates/burn-store/src/pytorch/reader.rs	86.81%	12 Missing ⚠️
...ates/burn-train/src/metric/vision/lpips/weights.rs	95.08%	6 Missing ⚠️

❌ Your patch check has failed because the patch coverage (77.46%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (62.04%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4557      +/-   ##
==========================================
+ Coverage   61.21%   62.04%   +0.83%     
==========================================
  Files        1062     1074      +12     
  Lines      136012   138456    +2444     
==========================================
+ Hits        83258    85907    +2649     
+ Misses      52754    52549     -205

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

laggui

Currently, burn-train --features vision tests are not included in CI, so LPIPS pretrained test failures were not caught. Adding vision feature tests to CI should be considered in a follow-up PR.

Well that explains why it wasn't caught 😅 thanks

crates/burn-store/src/pytorch/lazy_data.rs

koreaygj · 2026-02-24T01:34:19Z

Currently, burn-train --features vision tests are not included in CI, so LPIPS pretrained test failures were not caught. Adding vision feature tests to CI should be considered in a follow-up PR.

Well that explains why it wasn't caught 😅 thanks

Current CI xtask/src/commands/test.rs includes spesial feature tests for burn-dataset, burn-core and burn-vision, but burn-train is not included.

If you accept to change it, I will open PR.
I can addburn-train vision tests in xtask/src/commands/test.rs
like below!

  // burn-train vision
  helpers::custom_crates_tests(
      vec!["burn-train"],
      handle_test_args(&["--features", "vision"],
  args.release),
      None,
      None,
      "std vision",
  )?;

laggui · 2026-02-24T13:01:07Z

I can addburn-train vision tests in xtask/src/commands/test.rs

Yeah we should do that! Can even be part of this PR if you want since it's related.

antimora

Thanks for working on this! TAR format support is a useful addition since a number of older torchvision models (AlexNet, SqueezeNet) on PyTorch Hub are in this format, and the two-step weight loading approach for LPIPS is the right design (separate ImageNet backbone weights + LPIPS linear layer weights).

That said, there are several issues that need to be addressed before this can be merged. The biggest one: there are no integration tests for the TAR format itself. burn-store has ~47 test functions covering ZIP and legacy formats with Python-generated fixtures, but zero tests for TAR loading. We need the same treatment here.

Summary of issues (details in inline comments):

Missing integration tests for TAR format in burn-store (critical)
~440 lines of copy-pasted code between rebuild_tensor and rebuild_tensor_v2
BFloat16 bug in TarSource element size mapping (will silently corrupt data)
Silent default to F32 in multiple places instead of returning errors for unknown storage types
Unused parameter, wasteful allocation in read_range, and debug println! left in tests

antimora · 2026-02-24T18:58:25Z

crates/burn-store/src/pytorch/lazy_data.rs

+                    {
+                        1
+                    } else {
+                        4 // Default to float (4 bytes)


Bug: "BFloat16Storage".contains("Half") returns false, so BFloat16 falls through to 4. BFloat16 is 2 bytes. This will silently compute wrong offsets and corrupt tensor data.

The rebuild_tensor / rebuild_tensor_v2 code has the correct explicit mapping ("BFloat16Storage" => DType::BF16). Consider extracting a shared storage_type_to_dtype(name) -> DType helper and using dtype.size() here instead of a separate if/else chain with contains checks.

antimora · 2026-02-24T18:58:25Z

crates/burn-store/src/pytorch/lazy_data.rs

+    /// * `storages_data` - Raw storages blob with structure:
+    ///   - Count pickle (number of storages)
+    ///   - For each storage: metadata pickle + u64 num_elements + raw binary data
+    pub fn new(_tensors_data: &[u8], storages_data: Vec<u8>) -> std::io::Result<Self> {


_tensors_data is never used. It propagates up to LazyDataSource::from_tar and the callsite in reader.rs, making the API misleading. Remove it.

antimora · 2026-02-24T18:58:25Z

crates/burn-store/src/pytorch/lazy_data.rs

+                    .unwrap_or_else(|poisoned| poisoned.into_inner());
+                let data = source.read(key)?;
+                let end = (offset + length).min(data.len());
+                Ok(data[offset..end].to_vec())


Double allocation: source.read(key) allocates a Vec<u8> for the full storage, then this line allocates again for the slice. Since TarSource already holds the full blob in memory (storages_data), add a read_range method on TarSource that slices directly from self.storages_data[storage_offset + offset..].

antimora · 2026-02-24T18:58:25Z

crates/burn-store/src/pytorch/lazy_data.rs

+                        // tuple[2] is storage type class
+                        let stype = match &tuple[2] {
+                            super::pickle_reader::Object::Class { name, .. } => name.clone(),
+                            _ => "FloatStorage".to_string(),


Silently defaulting to "FloatStorage" when the storage type is not a Class will mask real parsing errors. Return an error instead.

antimora · 2026-02-24T18:58:25Z

crates/burn-store/src/pytorch/pickle_reader.rs


+/// Legacy _rebuild_tensor function for PyTorch < 1.6.
+/// Same as rebuild_tensor_v2 but with fewer arguments: (storage, storage_offset, size, stride)
+fn rebuild_tensor(


This entire function (~440 lines, through line 748) is a near-exact copy of rebuild_tensor_v2 below. The only difference is 4 args vs 5 args (v2 adds requires_grad and backward_hooks).

The TODO at line 936 acknowledges this. Please extract the shared logic: parse the storage args into a struct, build the closure once. Both functions should be thin wrappers. ~400 lines of duplication makes this harder to maintain and will cause bugs when one copy gets updated without the other.

antimora · 2026-02-24T18:58:25Z

crates/burn-store/src/pytorch/pickle_reader.rs

+                    module_name: _,
+                    name,
+                } => name.as_str(),
+                _ => "FloatStorage",


Same concern: defaulting to "FloatStorage" when the object type is unexpected will silently produce wrong results. Return an error for unexpected types.

antimora · 2026-02-24T18:58:25Z

crates/burn-store/src/pytorch/reader.rs

@@ -792,6 +818,137 @@ fn load_legacy_pytorch_file_with_metadata(
    Ok((tensors, metadata))


This is the biggest gap: there are no integration tests for the TAR format. The existing test suite in tests/reader/mod.rs has 47 tests covering ZIP and legacy formats with Python-generated fixtures. TAR needs the same treatment.

Since modern torch.save() cannot produce TAR files (this format predates PyTorch 0.1.10), you will need a Python script that manually constructs the TAR archive structure, similar to how create_legacy_with_offsets.py works.

At minimum, please add tests for:

TAR format detection (is_tar_file())

Loading a float32 tensor from TAR and verifying values

Loading multiple tensors (weight + bias) with correct shapes

Loading different dtypes (float32, float64, int64)

Thanks for the suggestion! Added 8 TAR format tests:

test_tar_format_detection - TAR file detection

test_tar_float32_tensor - float32 tensor loading

test_tar_float64_tensor - float64 tensor loading

test_tar_int64_tensor - int64 tensor loading

test_tar_multiple_tensors - multiple tensors (weight + bias)

test_tar_multi_dtype - mixed dtypes

test_tar_2d_tensor_shape - 2D shape verification

test_tar_metadata - metadata verification

antimora · 2026-02-24T18:58:25Z

crates/burn-train/src/metric/vision/lpips/metric.rs

        let image2 = TestTensor::<4>::ones([1, 3, 64, 64], &device);
        let distance = lpips.forward(image1, image2, Reduction::Mean);
        let distance_value = distance.into_data().to_vec::<f32>().unwrap()[0];
+        println!("LPIPS VGG distance (black vs white): {}", distance_value);


Debug println! left in test code. Same at lines 787-789, 795-797, 815-817. Also the weight-printing block at lines 783-798 should be removed before merging.

antimora · 2026-02-24T18:58:25Z

crates/burn-train/src/metric/vision/lpips/squeezenet.rs

 pub struct SqueezeFeatureExtractor<B: Backend> {
    /// Conv1: 3 -> 64, kernel 3x3, stride 2
-    conv1: Conv2d<B>,
+    pub conv1: Conv2d<B>,


conv1 was changed from private to pub only to support the debug weight printing in the test. Once that debug block is removed, revert this back to private.

antimora · 2026-02-24T18:58:25Z

crates/burn-train/src/metric/vision/lpips/weights.rs

+/// Load ImageNet pretrained backbone weights.
+fn load_backbone_weights<B: Backend>(
+    lpips: Lpips<B>,
+    _net: LpipsNet,


_net is unused since you match on the Lpips enum variant. Remove it. Same for load_lpips_weights at line 189.

- add TAR format integration tests - deduplicate rebuild_tensor functions - fix BFloat16 bug and silent F32 defaults - remove unused parameters and debug println!

- remove _net - remove println

koreaygj · 2026-02-25T04:16:09Z

Thanks for working on this! TAR format support is a useful addition since a number of older torchvision models (AlexNet, SqueezeNet) on PyTorch Hub are in this format, and the two-step weight loading approach for LPIPS is the right design (separate ImageNet backbone weights + LPIPS linear layer weights).

That said, there are several issues that need to be addressed before this can be merged. The biggest one: there are no integration tests for the TAR format itself. burn-store has ~47 test functions covering ZIP and legacy formats with Python-generated fixtures, but zero tests for TAR loading. We need the same treatment here.

Summary of issues (details in inline comments):

Missing integration tests for TAR format in burn-store (critical)

~440 lines of copy-pasted code between rebuild_tensor and rebuild_tensor_v2

BFloat16 bug in TarSource element size mapping (will silently corrupt data)

Silent default to F32 in multiple places instead of returning errors for unknown storage types

Unused parameter, wasteful allocation in read_range, and debug println! left in tests

Thanks for your review!
I fix all your feedback 😊

Missing integration tests for TAR format in burn-store (critical)
~440 lines of copy-pasted code between rebuild_tensor and rebuild_tensor_v2
BFloat16 bug in TarSource element size mapping (will silently corrupt data)
Silent default to F32 in multiple places instead of returning errors for unknown storage types
Unused parameter, wasteful allocation in read_range, and debug println! left in tests

antimora

Looks good, all the feedback has been addressed. Thanks for the thorough rework!

koreaygj added 3 commits February 22, 2026 15:30

feat: add TAR format support for legacy Pytorch models

da79c40

Support loading PyTorch models saved in TAR format (pre-1.6), such as AlexNet and SqueezeNet from torchvision.

fix: use backbone weights

a2b2fbc

- not just use linear weights, add backbone weights

fix: fmt

8285ec3

laggui reviewed Feb 23, 2026

View reviewed changes

crates/burn-store/src/pytorch/lazy_data.rs Show resolved Hide resolved

ci: add burn-train(vision) crate test

59485ef

antimora self-requested a review February 24, 2026 18:17

antimora requested changes Feb 24, 2026

View reviewed changes

antimora added enhancement Enhance existing features store labels Feb 24, 2026

koreaygj added 2 commits February 25, 2026 13:09

fix: address PR review feedback

77a7c44

- add TAR format integration tests - deduplicate rebuild_tensor functions - fix BFloat16 bug and silent F32 defaults - remove unused parameters and debug println!

fix: remove unused param and debug

2dabed9

- remove _net - remove println

style: apply rustfmt to TAR format code

f931b5d

koreaygj force-pushed the fix/lpips branch from 2c0b99e to f931b5d Compare February 25, 2026 04:19

antimora approved these changes Feb 25, 2026

View reviewed changes

laggui merged commit 3f5c1bb into tracel-ai:main Feb 25, 2026
11 checks passed

koreaygj deleted the fix/lpips branch February 26, 2026 01:18

		@@ -792,6 +818,137 @@ fn load_legacy_pytorch_file_with_metadata(
		Ok((tensors, metadata))

Conversation

koreaygj commented Feb 22, 2026

Summary

Checklist

Related Issues/PRs

Changes

Solution

Testing

Uh oh!

koreaygj commented Feb 22, 2026

Uh oh!

codecov bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

laggui left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

koreaygj commented Feb 24, 2026

Uh oh!

laggui commented Feb 24, 2026

Uh oh!

antimora left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

koreaygj commented Feb 25, 2026

Uh oh!

antimora left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Feb 22, 2026 •

edited

Loading