Locutusque
/

TinyMistral-248M-v2

Text Generation

text-generation-inference

Model card Files Files and versions

Locutusque commited on Dec 21, 2023

Commit

29bb759

·

1 Parent(s): 94a43b2

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ Like version 1, this model will be trained on a single GPU, with hopes of gettin
 - Train on 1,000,000 examples of Skylion007/openwebtext at a learning rate of 3e-4 and batch size of 32
 - Once perplexity reaches an average of ~100, a cosine scheduler will be applied, and batch size will be increased to 4096
-- Once the perplexity reaches an average of 50, the model will be trained on graelo/wikipedia and mattymchen/refinedweb-3m, and the batch size will be increased to 393,216.
 - I'm open to any suggestions to modify this roadmap if you feel it isn't sufficient!
 # Disclaimer

 - Train on 1,000,000 examples of Skylion007/openwebtext at a learning rate of 3e-4 and batch size of 32
 - Once perplexity reaches an average of ~100, a cosine scheduler will be applied, and batch size will be increased to 4096
+- Once the perplexity reaches an average of 50, the model will be trained on graelo/wikipedia and mattymchen/refinedweb-3m, and the batch size will be increased to 12,288.
 - I'm open to any suggestions to modify this roadmap if you feel it isn't sufficient!
 # Disclaimer