Blackroot commited on
Commit
87050d7
·
verified ·
1 Parent(s): e4b84eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -5,8 +5,7 @@ Test network using [Tensor Product Attention](https://arxiv.org/abs/2501.06425).
5
  - `test_train.py` runs with the exact configurations used to train this model and is the reproduction script. Data is assumed to be in JSONL format with `"text":"example text", "text":"..."`
6
 
7
  # Notes:
8
- Compared to the control model of Smollm2, this is bordering on incoherent. Potentially this model size is too small to correctly leverage differential attention. It's clearly picked up on some ideas in language, but is generally worse than the control model using GQA in terms of human output.
9
-
10
 
11
  # Training Metrics
12
 
 
5
  - `test_train.py` runs with the exact configurations used to train this model and is the reproduction script. Data is assumed to be in JSONL format with `"text":"example text", "text":"..."`
6
 
7
  # Notes:
8
+ Memory effcicient, many of the benefits here are for inference which are not really being leveraged at all, although you can probably fit a larger bsz than traditional MHA/GQA with this. The run time is very similar to MHA/GQA at this scale.
 
9
 
10
  # Training Metrics
11