Skip to content

Pull requests: huggingface/trl

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Continuous Batching support for AsyncGRPO
#5781 opened May 16, 2026 by qgallouedec Member Draft
8 tasks
Make the LLaVA / LLaVA-Next test guard explicit
#5778 opened May 15, 2026 by qgallouedec Member Loading…
Fix spurious KL gradients for zero-std reward groups when beta > 0
#5777 opened May 15, 2026 by xodn348 Contributor Loading…
5 of 7 tasks
cleanup xpu cahce memory after each test
#5771 opened May 15, 2026 by kaixuanliu Contributor Loading…
Memory-efficient PEFT/LoRA vLLM weight sync under DeepSpeed ZeRO-3
#5766 opened May 13, 2026 by rak96 Loading…
7 of 14 tasks
Add Qwen3-VL training chat template with generation markers
#5764 opened May 13, 2026 by aazizyan Contributor Loading…
5 of 8 tasks
docs: set max_completion_length=1024 in GRPO quickstart examples
#5759 opened May 13, 2026 by dhruvnigam93 Loading…
5 of 8 tasks
Add telemetry to trainers
#5758 opened May 12, 2026 by qgallouedec Member Loading…
Tighten old_per_token_logps recomputation check in GRPO
#5757 opened May 12, 2026 by wengeezhang Loading…
5 of 8 tasks
async_grpo don't return on queue.Empty
#5751 opened May 12, 2026 by AmineDiro Member Loading…
feat: move async rollout worker to separate process
#5749 opened May 11, 2026 by AmineDiro Member Loading…
3 tasks done
[AsyncGRPO] Fix missing tool gates in worker init (fixes #5742)
#5748 opened May 11, 2026 by aazizyan Contributor Loading…
5 of 8 tasks
Add end-to-end GRPO + OpenReward notebook (Local ORS / Toolathlon Gym / Qwen3.5-4B)
#5747 opened May 11, 2026 by rycerzes Contributor Loading…
3 of 8 tasks
Align tiny Qwen2.5-VL with Qwen/Qwen2.5-VL-3B-Instruct
#5739 opened May 9, 2026 by qgallouedec Member Loading…
cleanup vram
#5738 opened May 9, 2026 by ved1beta Loading…
3 of 8 tasks
[gold] Implement seq_kd in GOLDTrainer
#5725 opened May 7, 2026 by roycho96 Contributor Loading…
3 of 8 tasks
feat: add Falcon Mamba training chat templates with generation markers
#5723 opened May 7, 2026 by DagaBhai Contributor Loading…
4 of 8 tasks
[feat, algo] Support RandOpt algo
#5719 opened May 7, 2026 by sunrainyg Loading…
4 of 8 tasks
ProTip! Adding no:label will show everything without a label.