ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations Paper β’ 2505.02819 β’ Published May 5 β’ 24
view article Article Bamba: Inference-Efficient Hybrid Mamba2 Model By rganti and 28 others β’ Dec 18, 2024 β’ 55
π March 2025 - Open releases from the Chinese community Collection 32 items β’ Updated 22 days ago β’ 13
How far can we go with ImageNet for Text-to-Image generation? Paper β’ 2502.21318 β’ Published Feb 28 β’ 26
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper β’ 2502.08235 β’ Published Feb 12 β’ 58
FoNE: Precise Single-Token Number Embeddings via Fourier Features Paper β’ 2502.09741 β’ Published Feb 13 β’ 14
Aira Collection Aira is a series of chatbots developed as an experimentation playground for value alignment. β’ 27 items β’ Updated Jun 20, 2024 β’ 1
Loxa Collection a Loxa family models are best models to running on CPU and GPU with high quality(=>92% accuracy) β’ 5 items β’ Updated Feb 3 β’ 2
Quadrifoglio π Collection Small text2text models finetuned on Italian machine translation tasks. β’ 6 items β’ Updated Jan 12 β’ 1
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper β’ 2412.13663 β’ Published Dec 18, 2024 β’ 150
RedPajama: an Open Dataset for Training Large Language Models Paper β’ 2411.12372 β’ Published Nov 19, 2024 β’ 56
FluidML: Fast and Memory Efficient Inference Optimization Paper β’ 2411.09242 β’ Published Nov 14, 2024 β’ 1
TΓLU 3: Pushing Frontiers in Open Language Model Post-Training Paper β’ 2411.15124 β’ Published Nov 22, 2024 β’ 65