Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space Paper • 2505.13308 • Published May 19 • 27
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens Paper • 2502.18890 • Published Feb 26 • 30
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers Paper • 2406.16747 • Published Jun 24, 2024 • 19