lxhlxhlxh555
commited on
Commit
•
0ede01e
1
Parent(s):
c3a9469
Update README.md
Browse files
README.md
CHANGED
@@ -22,8 +22,7 @@ special tokens for masking, and then require the model to predict the original t
|
|
22 |
|
23 |
### Pretraining Hyperparameters
|
24 |
|
25 |
-
We used a batch size of
|
26 |
-
The model was trained for xx steps.
|
27 |
|
28 |
## How to use the model
|
29 |
|
|
|
22 |
|
23 |
### Pretraining Hyperparameters
|
24 |
|
25 |
+
We used a batch size of 32, a maximum sequence length of 256, and a learning rate of 5e-5 for pre-training our models.
|
|
|
26 |
|
27 |
## How to use the model
|
28 |
|