QuRating: Selecting High-Quality Data for Training Language Models Paper • 2402.09739 • Published Feb 15, 2024 • 4
Lost in the Logic: An Evaluation of Large Language Models' Reasoning Capabilities on LSAT Logic Games Paper • 2409.19012 • Published Sep 23, 2024
Reward Bench 2 Collection Datasets, spaces, and models for Reward Bench 2 benchmark and paper! • 11 items • Updated 4 days ago • 8
Reward Bench 2 Collection Datasets, spaces, and models for Reward Bench 2 benchmark and paper! • 11 items • Updated 4 days ago • 8