AzalKhan/Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_294 Reinforcement Learning • 2B • Updated 25 days ago • 488
AzalKhan/Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_588 Reinforcement Learning • 2B • Updated about 1 month ago • 161
AzalKhan/Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_1 Reinforcement Learning • 2B • Updated Oct 3 • 5
AzalKhan/Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_882 Reinforcement Learning • 2B • Updated 28 days ago • 654
hdong0/Qwen3-1.7B-base-Open-R1-GRPO_dapo_acc_4096_nokl Text Generation • 2B • Updated 28 days ago • 132
hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_16384_nokl Text Generation • 8B • Updated 25 days ago • 122
hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_4096_nokl Text Generation • 8B • Updated 27 days ago • 171
hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_2048_nokl Text Generation • 8B • Updated 28 days ago • 145
AzalKhan/Qwen2.5-1.5B_open-r1-DAPO-Math-17k-Processed_294 Reinforcement Learning • 2B • Updated 28 days ago • 178
AzalKhan/Qwen2.5-1.5B_open-r1-DAPO-Math-17k-Processed_588 Reinforcement Learning • 2B • Updated 27 days ago • 178
AzalKhan/Qwen2.5-1.5B_open-r1-DAPO-Math-17k-Processed_882 Reinforcement Learning • 2B • Updated 27 days ago • 179
hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_2048_to_16384_nokl Text Generation • 8B • Updated 23 days ago • 253
hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_4096_to_16384_nokl Text Generation • 8B • Updated 22 days ago • 135
hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_8192_to_16384_nokl Text Generation • 8B • Updated 20 days ago • 160
AzalKhan/Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_294_FlashRL_G4-L1024 Reinforcement Learning • 2B • Updated 14 days ago • 12
AzalKhan/Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_588_FlashRL_G4-L1024 Reinforcement Learning • 2B • Updated 13 days ago • 21
AzalKhan/Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_882_FlashRL_G4-L1024 Reinforcement Learning • 2B • Updated 13 days ago • 25
AzalKhan/Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_1176_FlashRL_G4-L1024 Reinforcement Learning • 2B • Updated 13 days ago • 162
AzalKhan/Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_294_FlashRL_G4-L2048_new Reinforcement Learning • 2B • Updated 12 days ago • 471
AzalKhan/Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_588_FlashRL_G4-L2048_new Reinforcement Learning • 2B • Updated 12 days ago • 334
AzalKhan/Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_882_FlashRL_G4-L2048_new Reinforcement Learning • 2B • Updated 12 days ago • 330
AzalKhan/Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_1176_FlashRL_G4-L2048_new Reinforcement Learning • 2B • Updated 12 days ago • 478