vocab-transformers
/

distilbert-word2vec_256k-MLM_best

Model card Files Files and versions

nreimers commited on Apr 11, 2022

Commit

ee249ca

·

1 Parent(s): 69dbbb4

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -2,4 +2,6 @@
 This model has a word2vec token embedding matrix with 256k entries. The word2vec was trained on 100GB data from C4, MSMARCO, News, Wikipedia, S2ORC, for 3 epochs.
-Then the model was trained on this dataset with MLM for 250k steps (batch size 64). The token embeddings were NOT updated.

 This model has a word2vec token embedding matrix with 256k entries. The word2vec was trained on 100GB data from C4, MSMARCO, News, Wikipedia, S2ORC, for 3 epochs.
+Then the model was trained on this dataset with MLM for 1.37M steps (batch size 64). The token embeddings were NOT updated.
+For the initial word2vec weights with Gensim see: [https://huggingface.co/vocab-transformers/distilbert-word2vec_256k-MLM_1M/tree/main/word2vec](https://huggingface.co/vocab-transformers/distilbert-word2vec_256k-MLM_1M/tree/main/word2vec)