The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE Paper • 2502.17391 • Published 13 days ago • 1
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures Paper • 2502.00617 • Published Feb 2 • 1
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 25 days ago • 143
Demystifying the Token Dynamics of Deep Selective State Space Models Paper • 2410.03292 • Published Oct 4, 2024 • 1
TinyStories: How Small Can Language Models Be and Still Speak Coherent English? Paper • 2305.07759 • Published May 12, 2023 • 34
genCNN: A Convolutional Architecture for Word Sequence Prediction Paper • 1503.05034 • Published Mar 17, 2015 • 1
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model Paper • 2401.09417 • Published Jan 17, 2024 • 61
LION: Linear Group RNN for 3D Object Detection in Point Clouds Paper • 2407.18232 • Published Jul 25, 2024 • 2
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 17 days ago • 160
WebGames: Challenging General-Purpose Web-Browsing AI Agents Paper • 2502.18356 • Published 12 days ago • 11
From Markov to Laplace: How Mamba In-Context Learns Markov Chains Paper • 2502.10178 • Published 24 days ago • 1