Blackroot commited on
Commit
9e51385
·
verified ·
1 Parent(s): f2e6a64

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -1,3 +1,5 @@
 
 
1
  Test network using [Tensor Product Attention](https://arxiv.org/abs/2501.06425). Other than some alterations to the attention, such as 16 heads insted of 9 and using TPA, this is the same setup as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
2
 
3
  # Scripts:
 
1
+ From scratch pretraining on english only no synthetic data, no code, 3 epochs of 1 gig of data for the ~125M param model.
2
+
3
  Test network using [Tensor Product Attention](https://arxiv.org/abs/2501.06425). Other than some alterations to the attention, such as 16 heads insted of 9 and using TPA, this is the same setup as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
4
 
5
  # Scripts: