AmberYifan/Qwen2.5-1.5B-Code-GRPO-dense-reward Text Generation • 2B • Updated about 14 hours ago • 14 • 1
davidoj01/unsloth-phi-4-Instruct-LORA-Open-R1-Code-GRPO-b2-as4-t07-lr1en5 Text Generation • Updated Apr 9 • 19