CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published Mar 30 • 9
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published Feb 3 • 17
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published Feb 3 • 17
Magpie Reasoning Datasets Collection Reasoning datasets built by Magpie and its friends! • 8 items • Updated Jan 27 • 10
DongfuJiang/prm_gsm_2k_with_full_sol_mix_ref_remove_all_correct_hf Text Generation • Updated Dec 1, 2024 • 5 • 1