EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities Paper • 2409.16165 • Published Sep 24, 2024
AblationBench: Evaluating Automated Planning of Ablations in Empirical AI Research Paper • 2507.08038 • Published Jul 9
AblationBench Collection This is a collection of datasets used to evaluate language models in the task of ablation planning in empirical AI research. • 4 items • Updated May 16 • 5
AblationBench Collection This is a collection of datasets used to evaluate language models in the task of ablation planning in empirical AI research. • 4 items • Updated May 16 • 5