kokiryu commited on
Commit
7974d8b
·
1 Parent(s): 08b2a7b

Update README

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -13,6 +13,8 @@ This is a forked version of DistilBERT model pre-trained on 131 GB of Japanese w
13
  The teacher model is BERT-base that built in-house at LINE.
14
  The model was trained by [LINE Corporation](https://linecorp.com/).
15
 
 
 
16
  ## For Japanese
17
 
18
  https://github.com/line/LINE-DistilBERT-Japanese/blob/main/README_ja.md is written in Japanese.
@@ -21,7 +23,8 @@ https://github.com/line/LINE-DistilBERT-Japanese/blob/main/README_ja.md is writt
21
 
22
  ```python
23
  from transformers import AutoTokenizer, AutoModel
24
- tokenizer = AutoTokenizer.from_pretrained("line-corporation/line-distilbert-base-japanese", trust_remote_code=True)
 
25
  model = AutoModel.from_pretrained("line-corporation/line-distilbert-base-japanese")
26
 
27
  sentence = "LINE株式会社で[MASK]の研究・開発をしている。"
 
13
  The teacher model is BERT-base that built in-house at LINE.
14
  The model was trained by [LINE Corporation](https://linecorp.com/).
15
 
16
+ The difference from the [original repository](https://huggingface.co/line-corporation/line-distilbert-base-japanese) is the tokenizer code. In this repository, we updated it to work with `transformer>=4.34` after a [tokenizer refactoring](https://github.com/huggingface/transformers/pull/23909).
17
+
18
  ## For Japanese
19
 
20
  https://github.com/line/LINE-DistilBERT-Japanese/blob/main/README_ja.md is written in Japanese.
 
23
 
24
  ```python
25
  from transformers import AutoTokenizer, AutoModel
26
+ tokenizer = AutoTokenizer.from_pretrained("liwii/line-distilbert-base-japanese-fork", trust_remote_code=True)
27
+ # The model is the same as the original repository
28
  model = AutoModel.from_pretrained("line-corporation/line-distilbert-base-japanese")
29
 
30
  sentence = "LINE株式会社で[MASK]の研究・開発をしている。"