Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Paper • 2507.16746 • Published 4 days ago • 28
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning Paper • 2506.05523 • Published Jun 5 • 34
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning Paper • 2506.04245 • Published May 29 • 4
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5 • 44
How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning Paper • 2505.24273 • Published May 30 • 4
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space Paper • 2505.13308 • Published May 19 • 26
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations Paper • 2505.02819 • Published May 5 • 25
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning Paper • 2504.19162 • Published Apr 27 • 18
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10 • 47
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Paper • 2503.01307 • Published Mar 3 • 39
Has My System Prompt Been Used? Large Language Model Prompt Membership Inference Paper • 2502.09974 • Published Feb 14 • 9
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning Paper • 2502.06533 • Published Feb 10 • 18
Recurrent Models Collection These are checkpoints for recurrent LLMs developed to scale test-time compute by recurring in latent space. • 15 items • Updated May 21 • 9