-
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 7 -
Diverse Weight Averaging for Out-of-Distribution Generalization
Paper • 2205.09739 • Published • 1 -
Fusing finetuned models for better pretraining
Paper • 2204.03044 • Published • 6 -
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Paper • 2309.07311 • Published • 4
Niels Horn PRO
nilq
AI & ML interests
Natural language understanding, synthetic emotional speech, mechanistic interpretability.
Organizations
Merged Toy Models
-
nilq/lua-stories-linear-mistral-1L-tiny
Text Generation • 0.0B • Updated • 6 -
nilq/lua-stories-slerp-mistral-1L-tiny
Text Generation • 0.0B • Updated • 4 -
nilq/lua-stories-slerp-mistral-2L-tiny
Text Generation • 0.0B • Updated • 3 -
nilq/baby-python-1L-mistral-lua-stories-slerp
Text Generation • 0.0B • Updated • 4
Dynamics of Transformer Language Model Features
-
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 7 -
Diverse Weight Averaging for Out-of-Distribution Generalization
Paper • 2205.09739 • Published • 1 -
Fusing finetuned models for better pretraining
Paper • 2204.03044 • Published • 6 -
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Paper • 2309.07311 • Published • 4
Toy Models to Study
Merged Toy Models
-
nilq/lua-stories-linear-mistral-1L-tiny
Text Generation • 0.0B • Updated • 6 -
nilq/lua-stories-slerp-mistral-1L-tiny
Text Generation • 0.0B • Updated • 4 -
nilq/lua-stories-slerp-mistral-2L-tiny
Text Generation • 0.0B • Updated • 3 -
nilq/baby-python-1L-mistral-lua-stories-slerp
Text Generation • 0.0B • Updated • 4
Toy Base Models