Pretrained GPT-NeoX model with 2.06GB English news dataset. Took about 2 hour and 10 minutes to reach 10,000 iterations. Trained on p3dn.24xlarge.