-
A Survey of Small Language Models
Paper • 2410.20011 • Published • 44 -
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Paper • 2410.23168 • Published • 24 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 64 -
GPT or BERT: why not both?
Paper • 2410.24159 • Published • 14
Ekaterina
h1de0us
·
AI & ML interests
None yet
Organizations
None yet
[to-read]
-
A Survey of Small Language Models
Paper • 2410.20011 • Published • 44 -
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Paper • 2410.23168 • Published • 24 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 64 -
GPT or BERT: why not both?
Paper • 2410.24159 • Published • 14
TTS
models
0
None public yet
datasets
0
None public yet