luckeciano/Qwen-2.5-7B-RL-LACPO-BaselineNoKLNoEntropyNoSmoothSoftLabel Text Generation • Updated 29 days ago • 1.48k
luckeciano/Qwen-2.5-7B-RL-LACPO-BaselineNoKLNoEntropyNoSmoothVF0.1 Text Generation • Updated 28 days ago • 8
luckeciano/Qwen-2.5-7B-RL-LACPO-BaselineNoKLNoEntropyNoSmoothSoftLabelNormAdv Text Generation • Updated 21 days ago • 1.15k
hdong0/Qwen2.5-Math-1.5B-batch-mix-Open-R1-GRPO_100steps_lr1e-6 Text Generation • Updated 18 days ago • 7
hdong0/Qwen2.5-Math-1.5B-batch-mix-Open-R1-GRPO_100steps_lr1e-6_acc Text Generation • Updated 15 days ago • 41
hdong0/deepseek-Qwen2.5-Math-1.5B-Open-R1-GRPO_100steps_lr1e-6_acc Text Generation • Updated 16 days ago • 19
hdong0/Qwen2.5-Math-1.5B-Open-R1-GRPO_MATH_1000steps_lr1e-6_kl1e-3_acc Text Generation • Updated 13 days ago • 63