-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Fix default interpolation to BICUBIC for ViT, EfficientNet, PVT #43028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix default interpolation to BICUBIC for ViT, EfficientNet, PVT #43028
Conversation
The original Vision Transformer implementation uses BICUBIC interpolation for image preprocessing, but the HuggingFace image processor defaulted to BILINEAR. This change aligns the default with the original implementation. Changes: - Update default resample from BILINEAR to BICUBIC in ViTImageProcessor - Update default resample in ViTImageProcessorFast - Update docstrings to reflect BICUBIC Reference: https://github.com/huggingface/pytorch-image-models Fixes part of huggingface#28180
These models copy the resize method from ViT, so they need to be updated to match the new BICUBIC default interpolation.
|
[For maintainers] Suggested jobs to run (before merge) run-slow: efficientnet, imagegpt, layoutlmv2, layoutlmv3, pvt, segformer, vit |
|
Quick clarification on the file changes: The second commit updates 6 additional models because they copy ViT's resize method. After changing ViT from BILINEAR to BICUBIC, these copies needed to stay in sync. This PR complements the ongoing interpolation fixes (#28180) by @lukepayyapilli for other models. Let me know if any changes are needed! |
|
This makes sense and aligns ViT preprocessing with expected defaults (e.g. TIMM). Since this changes a default behavior, it might be worth explicitly calling it out as a breaking change for users relying on the previous interpolation. A small regression test asserting the default resampling value could also help prevent accidental future changes. |
|
Hey, did you check this against the original implementation in each case? |
After verification against original implementations: - ImageGPT: OpenAI original uses BILINEAR - Segformer: MMSegmentation uses BILINEAR by default - LayoutLMv2: Microsoft/Detectron2 uses BILINEAR - LayoutLMv3: Microsoft/Detectron2 uses BILINEAR These models were incorrectly changed to BICUBIC. Only ViT, EfficientNet, and PVT should use BICUBIC (verified against timm).
|
Yes, verified against original implementations. BICUBIC confirmed (timm): ViT, EfficientNet, PVT Reverted to BILINEAR: ImageGPT, Segformer, LayoutLMv2, LayoutLMv3 - these use BILINEAR in their original repos (OpenAI, MMSegmentation, Microsoft/Detectron2). Pushed a fix, PR now only changes the 3 models that actually need BICUBIC. |
|
cc @NielsRogge for review in that case! |
What does this PR do?
Fixes incorrect default interpolation in ViT, EfficientNet, and PVT image processors. The original implementations use BICUBIC but HuggingFace defaulted to BILINEAR/NEAREST.
Changes:
Verified against timm - all three use BICUBIC: https://github.com/huggingface/pytorch-image-models/blob/main/timm/data/transforms_factory.py#L75
Note: This is a breaking change for users relying on previous defaults.
Fixes part of #28180
Before submitting
Pull Request section?
to it if that's the case. Verify interpolation of image processors #28180
Who can review?
@NielsRogge