Blackroot
/

TensorProduct-Microllama

Text Generation

Model card Files Files and versions Community

Blackroot commited on Feb 7

Commit

9e51385

·

verified ·

1 Parent(s): f2e6a64

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -1,3 +1,5 @@
 Test network using [Tensor Product Attention](https://arxiv.org/abs/2501.06425). Other than some alterations to the attention, such as 16 heads insted of 9 and using TPA, this is the same setup as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
 # Scripts:

+From scratch pretraining on english only no synthetic data, no code, 3 epochs of 1 gig of data for the ~125M param model.
 Test network using [Tensor Product Attention](https://arxiv.org/abs/2501.06425). Other than some alterations to the attention, such as 16 heads insted of 9 and using TPA, this is the same setup as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
 # Scripts: