Skip to content

fix(dataset): make torchcodec cache fork-safe for num_workers>0#3327

Open
AjAnubolu wants to merge 1 commit intohuggingface:mainfrom
AjAnubolu:fix/1488-torchcodec-fork-safety
Open

fix(dataset): make torchcodec cache fork-safe for num_workers>0#3327
AjAnubolu wants to merge 1 commit intohuggingface:mainfrom
AjAnubolu:fix/1488-torchcodec-fork-safety

Conversation

@AjAnubolu
Copy link
Copy Markdown

Summary

The module-level _default_decoder_cache inherited stale fsspec/libav state across fork(), breaking the torchcodec backend under DataLoader workers (the dominant __getitem__ cost for SmolVLA / Diffusion Policy training). Stamps the cache with os.getpid() and self-resets on first access in a forked child; adds an opt-in lerobot_worker_init_fn helper.

Closes #1488. Complements #3123 (action tokenization perf).

…>0 (huggingface#1488)

Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
@github-actions github-actions bot added the dataset Issues regarding data inputs, processing, or datasets label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset Issues regarding data inputs, processing, or datasets

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataloader is blazingly fast for ACT training but VERY slow for SmolVLA and Diffusion Policy training

1 participant