ARGUS: Hallucination and Omission Evaluation in Video-LLMs Paper • 2506.07371 • Published 2 days ago • 8
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning Paper • 2506.05523 • Published 5 days ago • 31
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning Paper • 2506.04245 • Published 12 days ago • 4
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published 5 days ago • 35
How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning Paper • 2505.24273 • Published 12 days ago • 4
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space Paper • 2505.13308 • Published 22 days ago • 26
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations Paper • 2505.02819 • Published May 5 • 24
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning Paper • 2504.19162 • Published Apr 27 • 17
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10 • 46
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Paper • 2503.01307 • Published Mar 3 • 39
Has My System Prompt Been Used? Large Language Model Prompt Membership Inference Paper • 2502.09974 • Published Feb 14 • 9
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning Paper • 2502.06533 • Published Feb 10 • 18
Recurrent Models Collection These are checkpoints for recurrent LLMs developed to scale test-time compute by recurring in latent space. • 15 items • Updated 20 days ago • 7
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 142