nikhilchandak 's Collections

answer-matching

Free-form datasets, human annotations, and sample-level model outputs for "Answer Matching Outperforms Multiple Choice for Language Model Evaluation"