On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting Paper • 2508.11408 • Published 7 days ago • 6
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models Paper • 2505.17826 • Published May 23 • 9
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 209
meta-llama/Llama-2-13b-chat-hf Text Generation • 13B • Updated Apr 17, 2024 • 177k • • 1.1k
openai/clip-vit-base-patch32 Zero-Shot Image Classification • Updated Feb 29, 2024 • 18.6M • 745
distilbert/distilbert-base-multilingual-cased Fill-Mask • 0.1B • Updated May 6, 2024 • 983k • 210