Model was trained for 8 hours 1.75 epochs (56000 steps) on 8 x 80GB A100 GPUs with the following arguments
!python train.py --config tv2o-large --max-len 4096 --acc-grad 8
Model was trained for 8 hours 1.75 epochs (56000 steps) on 8 x 80GB A100 GPUs with the following arguments
!python train.py --config tv2o-large --max-len 4096 --acc-grad 8