FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper • 2505.02735 • Published May 5 • 32
DMind Benchmark: The First Comprehensive Benchmark for LLM Evaluation in the Web3 Domain Paper • 2504.16116 • Published Apr 18 • 12