view article Article Gotchas in Tokenizer Behavior Every Developer Should Know By qgallouedec • Apr 18 • 41
Sparse Foundational Llama 2 Models Collection Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras • 27 items • Updated Apr 16 • 9
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7, 2024 • 47
📚 FineWeb-Edu Collection FineWeb-Edu datasets, classifier and ablation model • 5 items • Updated Jun 12, 2024 • 15