The checkpoint for [Self-Adjust Softmax](https://arxiv.org/abs/2502.18277)
Gausson Tschen
Gausson
AI & ML interests
LLM Architecture, Pre-training, Deep Neural Network Optimization, Sparsity
Organizations
models
9

Gausson/sep_cache
8B
•
Updated
•
8
•
1

Gausson/pythia-160m-deduped-n64-SepLLM
0.2B
•
Updated

Gausson/pythia-160m-deduped-n64h-SepLLM
0.2B
•
Updated
•
2

Gausson/pythia-160m-deduped-n64-StreamingLLM
0.2B
•
Updated

Gausson/pythia-160m-deduped-n64-RoBiPE-SepLLM
0.2B
•
Updated

Gausson/pythia-160m-deduped-n128-SepLLM
0.2B
•
Updated

Gausson/pythia-160m-deduped-SepLLM
0.2B
•
Updated

Gausson/pythia-160m-deduped-n64ht-SepLLM
0.2B
•
Updated

Gausson/gpt-neox-125m-deduped-SA
0.2B
•
Updated
•
1
datasets
0
None public yet