answer-matching Collection Free-form datasets, human annotations, and sample-level model outputs for "Answer Matching Outperforms Multiple Choice for Language Model Evaluation" • 2 items • Updated 19 days ago • 2
Answer Matching Outperforms Multiple Choice for Language Model Evaluation Paper • 2507.02856 • Published 19 days ago • 8