Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Niels Horn's picture

1 6 21

Niels Horn

nilq

olekstr's profile picture

CiaraRowles's profile picture

GameArtService's profile picture

·

https://niels.ninja

nilq

AI & ML interests

Natural language understanding, synthetic emotional speech, mechanistic interpretability.

Organizations

nilq 's collections 4

Dynamics of Transformer Language Model Features

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Paper • 2203.05482 • Published Mar 10, 2022 • 7
Diverse Weight Averaging for Out-of-Distribution Generalization

Paper • 2205.09739 • Published May 19, 2022 • 1
Fusing finetuned models for better pretraining

Paper • 2204.03044 • Published Apr 6, 2022 • 6
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

Paper • 2309.07311 • Published Sep 13, 2023 • 4

Merged Toy Models

nilq/lua-stories-linear-mistral-1L-tiny

Text Generation • 35.1M • Updated Mar 2, 2024 • 14
nilq/lua-stories-slerp-mistral-1L-tiny

Text Generation • 35.1M • Updated Mar 10, 2024 • 16
nilq/lua-stories-slerp-mistral-2L-tiny

Text Generation • 37.5M • Updated Mar 10, 2024 • 16
nilq/baby-python-1L-mistral-lua-stories-slerp

Text Generation • 35.1M • Updated Oct 17, 2024 • 12

Toy Models to Study

nilq/lua-mistral-1L-mini

Text Generation • 4.13M • Updated Feb 29, 2024 • 11
nilq/mistral-1L-mini

Text Generation • 4.13M • Updated Feb 28, 2024 • 24
nilq/mistral-1L-tiny

Text Generation • 35.1M • Updated Mar 1, 2024 • 45 • 6
nilq/lua-mistral-1L-tiny

Text Generation • 35.1M • Updated Feb 29, 2024 • 42

Toy Base Models

nilq/baby-python-mistral-1L-tiny-base

Text Generation • 35.1M • Updated Oct 17, 2024 • 26

Dynamics of Transformer Language Model Features

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Paper • 2203.05482 • Published Mar 10, 2022 • 7
Diverse Weight Averaging for Out-of-Distribution Generalization

Paper • 2205.09739 • Published May 19, 2022 • 1
Fusing finetuned models for better pretraining

Paper • 2204.03044 • Published Apr 6, 2022 • 6
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

Paper • 2309.07311 • Published Sep 13, 2023 • 4

Toy Models to Study

nilq/lua-mistral-1L-mini

Text Generation • 4.13M • Updated Feb 29, 2024 • 11
nilq/mistral-1L-mini

Text Generation • 4.13M • Updated Feb 28, 2024 • 24
nilq/mistral-1L-tiny

Text Generation • 35.1M • Updated Mar 1, 2024 • 45 • 6
nilq/lua-mistral-1L-tiny

Text Generation • 35.1M • Updated Feb 29, 2024 • 42

Merged Toy Models

nilq/lua-stories-linear-mistral-1L-tiny

Text Generation • 35.1M • Updated Mar 2, 2024 • 14
nilq/lua-stories-slerp-mistral-1L-tiny

Text Generation • 35.1M • Updated Mar 10, 2024 • 16
nilq/lua-stories-slerp-mistral-2L-tiny

Text Generation • 37.5M • Updated Mar 10, 2024 • 16
nilq/baby-python-1L-mistral-lua-stories-slerp

Text Generation • 35.1M • Updated Oct 17, 2024 • 12

Toy Base Models

nilq/baby-python-mistral-1L-tiny-base

Text Generation • 35.1M • Updated Oct 17, 2024 • 26

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs