When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research Paper • 2505.11855 • Published 24 days ago • 9
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research Paper • 2505.11855 • Published 24 days ago • 9
[ICML 2025] Robustness in RMs Collection Dataset and reward models for "On the Robustness of Reward Models for Language Model Alignment (ICML 2025)" • 8 items • Updated 14 days ago