Submitted by jackzhang 39 The Alignment Waltz: Jointly Training Agents to Collaborate for Safety AI at Meta 2
Submitted by Kylin-ll 30 Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense AI at Meta 2
Submitted by nielsr 13 OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows AI at Meta 4
Submitted by jacobkahn 3 CWM: An Open-Weights LLM for Research on Code Generation with World Models AI at Meta 678 2
Submitted by weizhepei 52 TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning AI at Meta 3
Submitted by Chuanyang-Jin 18 The Era of Real-World Human Interaction: RL from User Conversations AI at Meta 3