Teaching Large Language Models to Reason with Reinforcement Learning Paper • 2403.04642 • Published Mar 7, 2024 • 50
LM-Parallel/grpo_llama-hs-v3_bs64_rollout5-lr1e-5-seq-weighted-kl0.01-20250319052012 Updated 1 day ago
LM-Parallel/grpo_llama-hs-v3_bs64_rollout5-lr1e-5-seq-weighted-kl0.01-20250319052012 Updated 1 day ago
LM-Parallel/grpo_llama-hsp-v3_bs64_rollout5-lr1e-5-sw-t1.0-kl0.001-sc10-bm10sbm15-20250411103359 Updated 1 day ago • 2
LM-Parallel/grpo_llama-hsp-v3_bs64_rollout5-lr1e-5-sw-t1.0-kl0.001-sc10-bm10sbm15-20250411103359 Updated 1 day ago • 2
LM-Parallel/grpo_llama-hsp-v3_bs64_rollout5-lr1e-5-sw-t1.0-kl0.001-bm10-sbm15-nc-20250411054109 Updated 2 days ago
LM-Parallel/grpo_llama-hsp-v3_bs64_rollout5-lr1e-5-sw-t1.0-kl0.001-bm10-sbm15-nc-20250411054109 Updated 2 days ago
LM-Parallel/grpo_llama-hsp-v3_bs64_rollout5-lr1e-5-sw-t1.0-kl0.01-sc10-bm10sbm15-20250325133311 Updated 2 days ago
LM-Parallel/grpo_llama-hsp-v3_bs64_rollout5-lr1e-5-sw-t1.0-kl0.01-sc10-bm10sbm15-20250325133311 Updated 2 days ago
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 93
LM-Parallel/grpo_llama-hsp-v3-mar23-kl0.01-subcall-cond10-beam10-subbeam15-train-20250325133311 Updated 19 days ago
LM-Parallel/grpo_llama-hsp-v3-mar23-kl0.01-subcall-cond10-beam10-subbeam15-train-20250325133311 Updated 19 days ago