MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix Paper • 2505.13032 • Published 24 days ago • 1
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper • 2505.02735 • Published May 5 • 31
Objaverse++: Curated 3D Object Dataset with Quality Annotations Paper • 2504.07334 • Published Apr 9 • 1
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published Apr 7 • 44
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11 • 66
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 103
MetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training Strategies Paper • 2501.15384 • Published Jan 26
OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving Paper • 2412.10734 • Published Dec 14, 2024 • 1
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published Dec 10, 2024 • 23
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks Paper • 2410.06526 • Published Oct 9, 2024 • 1
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model Paper • 2410.13639 • Published Oct 17, 2024 • 19
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions Paper • 2410.20424 • Published Oct 27, 2024 • 41
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents Paper • 2406.13923 • Published Jun 20, 2024 • 23
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published May 29, 2024 • 50