SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 29 days ago • 180
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 229
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 229
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 229
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 97
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 97
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 97
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published May 28, 2024 • 12
A Dataset and Strong Baselines for Classification of Czech News Texts Paper • 2307.10666 • Published Jul 20, 2023
Evaluating the Search Phase of Neural Architecture Search Paper • 1902.08142 • Published Feb 21, 2019
Landmark Attention: Random-Access Infinite Context Length for Transformers Paper • 2305.16300 • Published May 25, 2023