projecte-aina
/

Plume32k

text-generation

text-generation-inference

Model card Files Files and versions Community

javi8979 commited on Jun 12, 2024

Commit

42b91e3

·

verified ·

1 Parent(s): 34571e8

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -80,9 +80,9 @@ generated_text = tokenizer.decode(output_ids[0, input_length: ], skip_special_to
 ## Training
-For training, the learning rate is warmed up from $1 \times 10^{-7}$ to a maximum of $3 \times 10^{-4}$ over the first 2000 steps. We apply a weight decay of 0.1 and a gradient clipping of 1.0. During training, we set an effective batch size of 81,920 tokens per gradient step distributed over 40 NVIDIA H100-64GB GPUs. We use DeepSpeed with full \texttt{float32} training. We show in the next table the training hyperparameters:
-| **Hyper-Parameter** |                          |
 |---------------------|--------------------------|
 | Batch size          | 40                       |
 | Number of Epochs    | 1                        |

 ## Training
+For training, the learning rate is warmed up from 1e-7 to a maximum of 3e-4 over the first 2000 steps. We apply a weight decay of 0.1 and a gradient clipping of 1.0. During training, we set an effective batch size of 81,920 tokens per gradient step distributed over 40 NVIDIA H100-64GB GPUs. We use DeepSpeed with full \texttt{float32} training. We show in the next table the training hyperparameters:
+| **Hyper-Parameter** |     **Value**                     |
 |---------------------|--------------------------|
 | Batch size          | 40                       |
 | Number of Epochs    | 1                        |