Update README
Browse files
README.md
CHANGED
@@ -13,6 +13,8 @@ This is a forked version of DistilBERT model pre-trained on 131 GB of Japanese w
|
|
13 |
The teacher model is BERT-base that built in-house at LINE.
|
14 |
The model was trained by [LINE Corporation](https://linecorp.com/).
|
15 |
|
|
|
|
|
16 |
## For Japanese
|
17 |
|
18 |
https://github.com/line/LINE-DistilBERT-Japanese/blob/main/README_ja.md is written in Japanese.
|
@@ -21,7 +23,8 @@ https://github.com/line/LINE-DistilBERT-Japanese/blob/main/README_ja.md is writt
|
|
21 |
|
22 |
```python
|
23 |
from transformers import AutoTokenizer, AutoModel
|
24 |
-
tokenizer = AutoTokenizer.from_pretrained("
|
|
|
25 |
model = AutoModel.from_pretrained("line-corporation/line-distilbert-base-japanese")
|
26 |
|
27 |
sentence = "LINE株式会社で[MASK]の研究・開発をしている。"
|
|
|
13 |
The teacher model is BERT-base that built in-house at LINE.
|
14 |
The model was trained by [LINE Corporation](https://linecorp.com/).
|
15 |
|
16 |
+
The difference from the [original repository](https://huggingface.co/line-corporation/line-distilbert-base-japanese) is the tokenizer code. In this repository, we updated it to work with `transformer>=4.34` after a [tokenizer refactoring](https://github.com/huggingface/transformers/pull/23909).
|
17 |
+
|
18 |
## For Japanese
|
19 |
|
20 |
https://github.com/line/LINE-DistilBERT-Japanese/blob/main/README_ja.md is written in Japanese.
|
|
|
23 |
|
24 |
```python
|
25 |
from transformers import AutoTokenizer, AutoModel
|
26 |
+
tokenizer = AutoTokenizer.from_pretrained("liwii/line-distilbert-base-japanese-fork", trust_remote_code=True)
|
27 |
+
# The model is the same as the original repository
|
28 |
model = AutoModel.from_pretrained("line-corporation/line-distilbert-base-japanese")
|
29 |
|
30 |
sentence = "LINE株式会社で[MASK]の研究・開発をしている。"
|