Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA Paper • 2504.10419 • Published Apr 14 • 4
In Case You Missed It: ARC 'Challenge' Is Not That Challenging Paper • 2412.17758 • Published Dec 23, 2024 • 17
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists Paper • 2410.23331 • Published Oct 30, 2024 • 8