arxiv:2510.07959

DISCO: Diversifying Sample Condensation for Efficient Model Evaluation

Published on Oct 9

· Submitted by

Alexander Rubinstein on Oct 13

Eberhard Karls Universität Tübingen

Upvote

Authors:

Martin Gubri ,

Abstract

A method called DISCO selects samples with the greatest model disagreements to predict performance, achieving state-of-the-art results across various benchmarks with reduced computational cost.

AI-generated summary

Evaluating modern machine learning models has become prohibitively expensive. Benchmarks such as LMMs-Eval and HELM demand thousands of GPU hours per model. Costly evaluation reduces inclusivity, slows the cycle of innovation, and worsens environmental impact. The typical approach follows two steps. First, select an anchor subset of data. Second, train a mapping from the accuracy on this subset to the final test result. The drawback is that anchor selection depends on clustering, which can be complex and sensitive to design choices. We argue that promoting diversity among samples is not essential; what matters is to select samples that maximise diversity in model responses. Our method, Diversifying Sample Condensation (DISCO), selects the top-k samples with the greatest model disagreements. This uses greedy, sample-wise statistics rather than global clustering. The approach is conceptually simpler. From a theoretical view, inter-model disagreement provides an information-theoretically optimal rule for such greedy selection. DISCO shows empirical gains over prior methods, achieving state-of-the-art results in performance prediction across MMLU, Hellaswag, Winogrande, and ARC. Code is available here: https://github.com/arubique/disco-public.

View arXiv page View PDF Add to collection

Community

arubique

Paper submitter 3 days ago

How to evaluate your LLMs on benchmarks like MMLU at 1% cost?

The answer is in our new paper, where we show that outputs on a small subset of test samples that maximise diversity in model responses are very predictive of the full dataset performance.

Project page: https://arubique.github.io/disco-site/
Paper: https://arxiv.org/abs/2510.07959
Code: https://github.com/arubique/disco-public

Big thanks to my co-authors Benjamin Raible, Martin Gubri, and Seong Joon Oh