-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Pull requests: huggingface/trl
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
AsyncGRPOTrainer: add PEFT/LoRA support
#5896
opened May 31, 2026 by
rycerzes
Contributor
Loading…
5 of 8 tasks
AsyncGRPOTrainer: add ProcessorMixin handling
#5895
opened May 31, 2026 by
rycerzes
Contributor
Loading…
5 of 8 tasks
AsyncGRPOTrainer: add sampling parameters (top_p, top_k, min_p, repetition_penalty)
#5894
opened May 31, 2026 by
rycerzes
Contributor
Loading…
5 of 8 tasks
AsyncGRPOTrainer: add model_init_kwargs support
#5893
opened May 31, 2026 by
rycerzes
Contributor
Loading…
5 of 8 tasks
async grpo native weight sync with vllm>=0.22.0
#5892
opened May 30, 2026 by
AmineDiro
Member
Loading…
Fix GRPO use_liger_kernel under DeepSpeed ZeRO-3
#5891
opened May 30, 2026 by
kashif
Collaborator
Loading…
fix(grpo,rloo): apply generation_config override in use_transformers_paged path
#5888
opened May 30, 2026 by
Sumu004
Loading…
4 of 8 tasks
Cross-tokenizer alignment via byte offsets in GOLD trainer
#5885
opened May 29, 2026 by
kashif
Collaborator
Loading…
4 of 8 tasks
[2/2] refactor: decoupled self distillation trainers; cleanup
#5883
opened May 29, 2026 by
LeonEricsson
Collaborator
Loading…
8 tasks
DPOTrainer: eagerly delete intermediate logits tensors to reduce peak memory
#5882
opened May 29, 2026 by
flutist
Contributor
Loading…
[DPOTrainer] Drop images when max_length truncation causes token/feature mismatch
#5881
opened May 29, 2026 by
flutist
Contributor
Loading…
Simplify reference model handling in GRPO/RLOO
#5877
opened May 29, 2026 by
albertvillanova
Member
Loading…
Simplify reference model handling in DPO
#5876
opened May 29, 2026 by
albertvillanova
Member
Loading…
Fix
loss_type="chunked_nll" under DeepSpeed ZeRO-3
#5873
opened May 27, 2026 by
qgallouedec
Member
Loading…
Add Idefics3 original and training chat template with generation markers
#5871
opened May 27, 2026 by
aazizyan
Contributor
Loading…
Removed
generate_rollout_completions
#5870
opened May 27, 2026 by
sergiopaniego
Member
Loading…
8 tasks
Add SmolVLM original and training chat template with generation markers
#5868
opened May 27, 2026 by
aazizyan
Contributor
Loading…
[1/2] refactor: decoupled self distillation trainers (sdpo, sdft, ...)
#5862
opened May 27, 2026 by
LeonEricsson
Collaborator
Loading…
4 of 12 tasks
[AsyncGRPO] Rollout worker: set aiohttp limit to max(100, max_inflight_tasks)
#5861
opened May 27, 2026 by
ggcr
Loading…
3 of 8 tasks
Handle empty conversational fields in dataset format checks
#5860
opened May 27, 2026 by
emery-Xu
Loading…
3 of 8 tasks
Fix Qwen3.5 vLLM weight name remapping
#5858
opened May 27, 2026 by
haimianxing
Loading…
5 of 6 tasks
Support non-lm_head output projections in chunked SFT loss (GPTNeoX)
#5857
opened May 26, 2026 by
qgallouedec
Member
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-05-28.