hdong0/deepseek-Qwen-1.5B-batch-mix-Open-R1-GRPO_deepscaler_acc_seq_end_mask Updated 25 minutes ago • 3
hdong0/deepseek-Llama-8B-Open-R1-GRPO_deepscaler_1000steps_lr1e-6_kl1e-3_acc Updated about 1 hour ago • 28
hdong0/Qwen2.5-Math-1.5B-batch-mix-Open-R1-GRPO_deepscaler_1000steps_lr1e-6_kl1e-3_acc_seq_end_mask_2 Updated about 2 hours ago • 4
hdong0/deepseek-Llama-8B-Open-R1-GRPO_deepscaler_1000steps_lr1e-6_kl1e-3_acc_old Updated 1 day ago • 12
hdong0/Qwen2.5-Math-1.5B-batch-mix-Open-R1-GRPO_deepscaler_1000steps_lr1e-6_kl1e-3_acc_seq_end_mask_ Text Generation • Updated 1 day ago • 33
hdong0/Qwen2.5-Math-1.5B-Open-R1-Distill_deepmath_median_3epoch_GRPO_deepscaler_1000steps_lr1e-6_acc Updated 1 day ago • 38
hdong0/deepseek-Qwen2.5-Math-1.5B-Open-R1-GRPO_deepscaler_1000steps_lr1e-6_kl1e-3_acc Text Generation • Updated 2 days ago • 60
hdong0/Qwen2.5-Math-1.5B-baseline-Open-R1-GRPO_deepscaler_1000steps_lr1e-6_kl1e-3_acc_ Text Generation • Updated 2 days ago • 110
hdong0/Qwen2.5-Math-1.5B-batch-mix-Open-R1-GRPO_deepscaler_1000steps_lr1e-6_kl1e-3_acc_seq_end_mask Text Generation • Updated 2 days ago • 85
hdong0/Qwen__Qwen2.5-Math-1.5B_num_erased_tokens_128_remove_think_prompt_1 Viewer • Updated 15 days ago • 103k • 127