ReFT: Representation Finetuning for Language Models Paper β’ 2404.03592 β’ Published Apr 4, 2024 β’ 99
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others β’ 21 days ago β’ 145
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others β’ 27 days ago β’ 113
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper β’ 2504.20752 β’ Published Apr 29 β’ 91
view article Article Train your first Decision Transformer By edbeeching and 1 other β’ Sep 8, 2022 β’ 12
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr β’ Feb 7 β’ 151
view article Article What is test-time compute and how to scale it? By Kseniase and 1 other β’ Feb 6 β’ 90
view article Article Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained β Whatβs Really Changing in Transformers? By Kseniase and 1 other β’ Apr 4 β’ 14
view article Article Introducing the Synthetic Data Generator - Build Datasets with Natural Language By davidberenstein1957 and 5 others β’ Dec 16, 2024 β’ 128
view article Article Introducing RWKV β An RNN with the advantages of a transformer By BlinkDL and 3 others β’ May 15, 2023 β’ 21
FFN Fusion: Rethinking Sequential Computation in Large Language Models Paper β’ 2503.18908 β’ Published Mar 24 β’ 19
view article Article Open-Source Handwritten Signature Detection Model By samuellimabraz β’ Mar 14 β’ 113
view article Article Introducing smolagents: simple agents that write actions in code. By m-ric and 2 others β’ Dec 31, 2024 β’ 1.06k
TransMLA: Multi-head Latent Attention Is All You Need Paper β’ 2502.07864 β’ Published Feb 11 β’ 54
view article Article Zero to Hero with the TRL learning link bomb π£ By burtenshaw β’ Nov 25, 2024 β’ 6