view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11, 2025 • 116
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device Paper • 2602.20161 • Published Feb 23 • 23
mistralai/Voxtral-Mini-4B-Realtime-2602 Automatic Speech Recognition • 4B • Updated 29 days ago • 890k • 800
Robust and Calibrated Detection of Authentic Multimedia Content Paper • 2512.15182 • Published Dec 17, 2025 • 17
Robust and Calibrated Detection of Authentic Multimedia Content Paper • 2512.15182 • Published Dec 17, 2025 • 17
Robust and Calibrated Detection of Authentic Multimedia Content Paper • 2512.15182 • Published Dec 17, 2025 • 17
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26, 2025 • 137