Commit
·
29bb759
1
Parent(s):
94a43b2
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ Like version 1, this model will be trained on a single GPU, with hopes of gettin
|
|
11 |
|
12 |
- Train on 1,000,000 examples of Skylion007/openwebtext at a learning rate of 3e-4 and batch size of 32
|
13 |
- Once perplexity reaches an average of ~100, a cosine scheduler will be applied, and batch size will be increased to 4096
|
14 |
-
- Once the perplexity reaches an average of 50, the model will be trained on graelo/wikipedia and mattymchen/refinedweb-3m, and the batch size will be increased to
|
15 |
|
16 |
- I'm open to any suggestions to modify this roadmap if you feel it isn't sufficient!
|
17 |
# Disclaimer
|
|
|
11 |
|
12 |
- Train on 1,000,000 examples of Skylion007/openwebtext at a learning rate of 3e-4 and batch size of 32
|
13 |
- Once perplexity reaches an average of ~100, a cosine scheduler will be applied, and batch size will be increased to 4096
|
14 |
+
- Once the perplexity reaches an average of 50, the model will be trained on graelo/wikipedia and mattymchen/refinedweb-3m, and the batch size will be increased to 12,288.
|
15 |
|
16 |
- I'm open to any suggestions to modify this roadmap if you feel it isn't sufficient!
|
17 |
# Disclaimer
|