-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Pull requests: huggingface/trl
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Continuous Batching support for AsyncGRPO
#5781
opened May 16, 2026 by
qgallouedec
Member
•
Draft
8 tasks
Drop unjustified
model.visual. skip in GRPO / RLOO Qwen2.5-VL tests
#5780
opened May 15, 2026 by
qgallouedec
Member
Loading…
Fix tiny Qwen3-VL
deepstack_visual_indexes and drop the test skip
#5779
opened May 15, 2026 by
qgallouedec
Member
Loading…
Make the LLaVA / LLaVA-Next test guard explicit
#5778
opened May 15, 2026 by
qgallouedec
Member
Loading…
Fix spurious KL gradients for zero-std reward groups when beta > 0
#5777
opened May 15, 2026 by
xodn348
Contributor
Loading…
5 of 7 tasks
skip vision parts of the model for test_train_vlm_multi_image as well
#5774
opened May 15, 2026 by
kaixuanliu
Contributor
Loading…
cleanup xpu cahce memory after each test
#5771
opened May 15, 2026 by
kaixuanliu
Contributor
Loading…
Fix OOM in CI by reducing batch size in GRPO/RLOO VLM tests
#5767
opened May 13, 2026 by
albertvillanova
Member
Loading…
Memory-efficient PEFT/LoRA vLLM weight sync under DeepSpeed ZeRO-3
#5766
opened May 13, 2026 by
rak96
Loading…
7 of 14 tasks
feat(grpo): replace deprecated
use_transformers_paged with transformers continuous batching
#5765
opened May 13, 2026 by
sergiopaniego
Member
Loading…
4 of 8 tasks
Add Qwen3-VL training chat template with generation markers
#5764
opened May 13, 2026 by
aazizyan
Contributor
Loading…
5 of 8 tasks
docs: set max_completion_length=1024 in GRPO quickstart examples
#5759
opened May 13, 2026 by
dhruvnigam93
Loading…
5 of 8 tasks
Tighten old_per_token_logps recomputation check in GRPO
#5757
opened May 12, 2026 by
wengeezhang
Loading…
5 of 8 tasks
feat: move async rollout worker to separate process
#5749
opened May 11, 2026 by
AmineDiro
Member
Loading…
3 tasks done
[AsyncGRPO] Fix missing tool gates in worker init (fixes #5742)
#5748
opened May 11, 2026 by
aazizyan
Contributor
Loading…
5 of 8 tasks
Add end-to-end GRPO + OpenReward notebook (Local ORS / Toolathlon Gym / Qwen3.5-4B)
#5747
opened May 11, 2026 by
rycerzes
Contributor
Loading…
3 of 8 tasks
Align tiny Qwen2.5-VL with Qwen/Qwen2.5-VL-3B-Instruct
#5739
opened May 9, 2026 by
qgallouedec
Member
Loading…
Fix
OpenRewardSpec omitting task‑scoped tools during rollout binding (fixes #5727)
#5729
opened May 7, 2026 by
rycerzes
Contributor
Loading…
[gold] Implement seq_kd in GOLDTrainer
#5725
opened May 7, 2026 by
roycho96
Contributor
Loading…
3 of 8 tasks
feat: add Falcon Mamba training chat templates with generation markers
#5723
opened May 7, 2026 by
DagaBhai
Contributor
Loading…
4 of 8 tasks
Align tiny DeepSeekV3 config with deepseek-ai/DeepSeek-R1-0528
#5715
opened May 6, 2026 by
qgallouedec
Member
•
Draft
8 tasks
Previous Next
ProTip!
Adding no:label will show everything without a label.