FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper • 2505.02735 • Published 1 day ago • 20
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published Dec 10, 2024 • 22
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model Paper • 2410.13639 • Published Oct 17, 2024 • 19
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions Paper • 2410.20424 • Published Oct 27, 2024 • 41
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents Paper • 2406.13923 • Published Jun 20, 2024 • 23