Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure
Abstract
TestCase-Eval is a benchmark for evaluating LLMs in generating comprehensive and targeted test cases for algorithm problems.
We introduce TestCase-Eval, a new benchmark for systematic evaluation of LLMs in test-case generation. TestCase-Eval includes 500 algorithm problems and 100,000 human-crafted solutions from the Codeforces platform. It focuses on two pivotal tasks: (1) Fault Coverage, which measures how well LLM-generated test sets probe diverse input scenarios and cover a wide range of potential failure modes. (2) Fault Exposure, which evaluates whether LLMs can craft a tailored test input that reveals a specific incorrect code implementation. We provide a comprehensive assessment of 19 state-of-the-art open-source and proprietary LLMs on TestCase-Eval, offering insights into their strengths and limitations in generating effective test cases for algorithm problems.
Community
We introduce TestCase-Eval, a new benchmark for systematic evaluation of LLMs in test-case generation.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems (2025)
- Humanity's Last Code Exam: Can Advanced LLMs Conquer Human's Hardest Code Competition? (2025)
- On Mutation-Guided Unit Test Generation (2025)
- DS-Bench: A Realistic Benchmark for Data Science Code Generation (2025)
- LogiCase: Effective Test Case Generation from Logical Description in Competitive Programming (2025)
- CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation (2025)
- Large Language Models for IT Automation Tasks: Are We There Yet? (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper