Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published 11 days ago • 18
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published 11 days ago • 18 • 2
Running 2.15k 2.15k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published about 1 month ago • 122
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 Paper • 2502.03544 • Published Feb 5 • 43 • 5
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published Feb 6 • 31
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published Feb 6 • 31
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published Feb 6 • 31 • 2
Representation Engineering: A Top-Down Approach to AI Transparency Paper • 2310.01405 • Published Oct 2, 2023 • 5
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Paper • 2403.03218 • Published Mar 5, 2024 • 1