xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations Paper • 2506.13651 • Published Jun 16 • 9
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective Paper • 2502.17262 • Published Feb 24 • 22
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback Paper • 2312.00849 • Published Dec 1, 2023 • 12