Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12, 2024 • 70
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries Paper • 2407.17468 • Published Jul 24, 2024
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published Mar 30 • 9
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published Mar 30 • 9
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models Paper • 2503.12072 • Published Mar 15
On Grounded Planning for Embodied Tasks with Language Models Paper • 2209.00465 • Published Aug 29, 2022 • 1
Optimizing Language Model's Reasoning Abilities with Weak Supervision Paper • 2405.04086 • Published May 7, 2024 • 2
Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning Paper • 2410.10074 • Published Oct 14, 2024 • 1
Small Models Struggle to Learn from Strong Reasoners Paper • 2502.12143 • Published Feb 17 • 37
ACECODER: Acing Coder RL via Automated Test-Case Synthesis Paper • 2502.01718 • Published Feb 3 • 28
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published Feb 3 • 17
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning Paper • 2305.15065 • Published May 24, 2023 • 1