Blackroot
/

TensorProduct-Microllama

Text Generation

Model card Files Files and versions Community

Blackroot commited on Feb 7

Commit

f2e6a64

·

verified ·

1 Parent(s): 87050d7

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ Test network using [Tensor Product Attention](https://arxiv.org/abs/2501.06425).
 - `test_train.py` runs with the exact configurations used to train this model and is the reproduction script. Data is assumed to be in JSONL format with `"text":"example text", "text":"..."`
 # Notes:
-Memory effcicient, many of the benefits here are for inference which are not really being leveraged at all, although you can probably fit a larger bsz than traditional MHA/GQA with this. The run time is very similar to MHA/GQA at this scale.
 # Training Metrics

 - `test_train.py` runs with the exact configurations used to train this model and is the reproduction script. Data is assumed to be in JSONL format with `"text":"example text", "text":"..."`
 # Notes:
+One of the primary reported benefits for TPA are for inference which are not really being leveraged at all, although you can probably fit a larger bsz than traditional MHA/GQA with this. This did save about 5% on params, that amount should scale much more as the network size increases. The run time is very similar to MHA/GQA at this scale.
 # Training Metrics