FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation Paper • 2506.04956 • Published 6 days ago • 3
Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs Paper • 2504.07866 • Published Apr 10 • 11
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 170
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale Paper • 2505.03005 • Published May 5 • 32
Volume estimates for unions of convex sets, and the Kakeya set conjecture in three dimensions Paper • 2502.17655 • Published Feb 24 • 1
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published Apr 8 • 110
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text Paper • 2501.15654 • Published Jan 26 • 15
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference Paper • 2503.13427 • Published Mar 17 • 3
RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper • 2503.14456 • Published Mar 18 • 149
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Apr 28 • 618
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning Paper • 2502.06060 • Published Feb 9 • 37
Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets Paper • 2410.01779 • Published Oct 2, 2024 • 2