Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,5 @@
|
|
|
|
|
|
1 |
Test network using [Tensor Product Attention](https://arxiv.org/abs/2501.06425). Other than some alterations to the attention, such as 16 heads insted of 9 and using TPA, this is the same setup as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
|
2 |
|
3 |
# Scripts:
|
|
|
1 |
+
From scratch pretraining on english only no synthetic data, no code, 3 epochs of 1 gig of data for the ~125M param model.
|
2 |
+
|
3 |
Test network using [Tensor Product Attention](https://arxiv.org/abs/2501.06425). Other than some alterations to the attention, such as 16 heads insted of 9 and using TPA, this is the same setup as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
|
4 |
|
5 |
# Scripts:
|