-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Bug description
I am now using a distributed sampler and set use_distributed_sampler=False but I wish I didn't have to! So let me explain:
Before starting MultiGPU, I had a simple sampler that was working ok on single GPU, no DDP. But then, starting with DDP, when I pass my simple sampler to the Lightning, it wraps it with DistributedSamplerWrapper and ignores my sampler's sample ordering by setting the shuffle to true. This took me a while to find out the bug, and then, when I did, I replaced my sampler with a distributed version of it, which then worked ok. But the whole thing was very unnecessary and time-consuming. The expected behvaiour is that Lightning converts a sampler to a distributed one respecting the original sampler' order.
Based on Trainer API for argument use_distributed_sampler:
By default, it will add shuffle=True for the train sampler and shuffle=False for validation/test/predict samplers.
Setting shuffle=True for a custom input sampler makes no sense and ignores the whole idea of a custom sampler.
This was already mentioned in this closed issue before (#21131 ), but was never resolved.
What version are you seeing the problem on?
v2.5
Reproduced in studio
No response
How to reproduce the bug
Error messages and logs
# Error messages and logs here please
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
More info
No response