Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper • 2405.20233 • Published May 30, 2024 • 7
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated 23 days ago • 150
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer By Pringled and 1 other • Oct 14, 2024 • 92
view article Article Hugging Face on PyTorch / XLA TPUs By jysohn23 and 1 other • Feb 9, 2021 • 3
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published Nov 11, 2024 • 39
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8, 2024 • 162