Blackroot commited on
Commit
f2e6a64
·
verified ·
1 Parent(s): 87050d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -5,7 +5,7 @@ Test network using [Tensor Product Attention](https://arxiv.org/abs/2501.06425).
5
  - `test_train.py` runs with the exact configurations used to train this model and is the reproduction script. Data is assumed to be in JSONL format with `"text":"example text", "text":"..."`
6
 
7
  # Notes:
8
- Memory effcicient, many of the benefits here are for inference which are not really being leveraged at all, although you can probably fit a larger bsz than traditional MHA/GQA with this. The run time is very similar to MHA/GQA at this scale.
9
 
10
  # Training Metrics
11
 
 
5
  - `test_train.py` runs with the exact configurations used to train this model and is the reproduction script. Data is assumed to be in JSONL format with `"text":"example text", "text":"..."`
6
 
7
  # Notes:
8
+ One of the primary reported benefits for TPA are for inference which are not really being leveraged at all, although you can probably fit a larger bsz than traditional MHA/GQA with this. This did save about 5% on params, that amount should scale much more as the network size increases. The run time is very similar to MHA/GQA at this scale.
9
 
10
  # Training Metrics
11