Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published 26 days ago • 39 • 26
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published 26 days ago • 39 • 26
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models Paper • 2412.07171 • Published Dec 10, 2024 • 1 • 1
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding Paper • 2501.00712 • Published 17 days ago • 6 • 4
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 26 days ago • 29 • 5
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published 26 days ago • 39 • 26
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12, 2024 • 75 • 7
Simple linear attention language models balance the recall-throughput tradeoff Paper • 2402.18668 • Published Feb 28, 2024 • 19 • 12
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12, 2024 • 75 • 7
Simple linear attention language models balance the recall-throughput tradeoff Paper • 2402.18668 • Published Feb 28, 2024 • 19 • 12
Simple linear attention language models balance the recall-throughput tradeoff Paper • 2402.18668 • Published Feb 28, 2024 • 19 • 12
Simple linear attention language models balance the recall-throughput tradeoff Paper • 2402.18668 • Published Feb 28, 2024 • 19 • 12
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 184 • 15
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 184 • 15
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 184 • 15
Resonance RoPE: Improving Context Length Generalization of Large Language Models Paper • 2403.00071 • Published Feb 29, 2024 • 23 • 2