luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v5-Train-NoKL-Marg-NormAdv Text Generation • Updated Apr 20 • 10
luckeciano/Qwen-2.5-7B-RL-LACPO-NoBaselineNoKLNoEntropy0.5NoSmooth Text Generation • Updated Apr 30 • 6
luckeciano/Qwen-2.5-7B-RL-LACPO-NoBaselineNoKLNoEntropy0.5Smooth10 Text Generation • Updated 21 days ago • 4