fix(dataset): make torchcodec cache fork-safe for num_workers>0 by AjAnubolu · Pull Request #3327 · huggingface/lerobot

AjAnubolu · 2026-04-08T20:56:13Z

Summary

The module-level _default_decoder_cache inherited stale fsspec/libav state across fork(), breaking the torchcodec backend under DataLoader workers (the dominant __getitem__ cost for SmolVLA / Diffusion Policy training). Stamps the cache with os.getpid() and self-resets on first access in a forked child; adds an opt-in lerobot_worker_init_fn helper.

Closes #1488. Complements #3123 (action tokenization perf).

…>0 (huggingface#1488) Signed-off-by: AjAnubolu <anuboluajay@gmail.com>

Make VideoDecoderCache fork-safe so torchcodec works with num_workers…

7493b37

…>0 (huggingface#1488) Signed-off-by: AjAnubolu <anuboluajay@gmail.com>

github-actions bot added the dataset Issues regarding data inputs, processing, or datasets label Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dataset): make torchcodec cache fork-safe for num_workers>0#3327

fix(dataset): make torchcodec cache fork-safe for num_workers>0#3327
AjAnubolu wants to merge 1 commit intohuggingface:mainfrom
AjAnubolu:fix/1488-torchcodec-fork-safety

AjAnubolu commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AjAnubolu commented Apr 8, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant