Papers
arxiv:2510.07959

DISCO: Diversifying Sample Condensation for Efficient Model Evaluation

Published on Oct 9
· Submitted by Alexander Rubinstein on Oct 13
Authors:
,
,

Abstract

A method called DISCO selects samples with the greatest model disagreements to predict performance, achieving state-of-the-art results across various benchmarks with reduced computational cost.

AI-generated summary

Evaluating modern machine learning models has become prohibitively expensive. Benchmarks such as LMMs-Eval and HELM demand thousands of GPU hours per model. Costly evaluation reduces inclusivity, slows the cycle of innovation, and worsens environmental impact. The typical approach follows two steps. First, select an anchor subset of data. Second, train a mapping from the accuracy on this subset to the final test result. The drawback is that anchor selection depends on clustering, which can be complex and sensitive to design choices. We argue that promoting diversity among samples is not essential; what matters is to select samples that maximise diversity in model responses. Our method, Diversifying Sample Condensation (DISCO), selects the top-k samples with the greatest model disagreements. This uses greedy, sample-wise statistics rather than global clustering. The approach is conceptually simpler. From a theoretical view, inter-model disagreement provides an information-theoretically optimal rule for such greedy selection. DISCO shows empirical gains over prior methods, achieving state-of-the-art results in performance prediction across MMLU, Hellaswag, Winogrande, and ARC. Code is available here: https://github.com/arubique/disco-public.

Community

Paper submitter

How to evaluate your LLMs on benchmarks like MMLU at 1% cost?

The answer is in our new paper, where we show that outputs on a small subset of test samples that maximise diversity in model responses are very predictive of the full dataset performance.

Project page: https://arubique.github.io/disco-site/
Paper: https://arxiv.org/abs/2510.07959
Code: https://github.com/arubique/disco-public

Big thanks to my co-authors Benjamin Raible, Martin Gubri, and Seong Joon Oh

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.07959 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.07959 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.07959 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.