WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue Paper • 2506.01881 • Published 8 days ago • 6
Reinforcing General Reasoning without Verifiers Paper • 2505.21493 • Published 14 days ago • 26
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper • 2505.13438 • Published 22 days ago • 35
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published Jan 21 • 66