Update README.md
Browse files
README.md
CHANGED
@@ -41,8 +41,7 @@ General-purpose Swich transformer based Japanese language model
|
|
41 |
```
|
42 |
|
43 |
|
44 |
-
|
45 |
-
## Masked Language Model
|
46 |
|
47 |
```python
|
48 |
>>> from transformers import AutoModel, AutoTokenizer, trainer_utils
|
@@ -51,9 +50,13 @@ General-purpose Swich transformer based Japanese language model
|
|
51 |
>>> model = AutoModel.from_pretrained("Tanrei/GPTSAN-japanese").to(device)
|
52 |
>>> tokenizer = AutoTokenizer.from_pretrained("Tanrei/GPTSAN-japanese")
|
53 |
>>> x_token = tokenizer.encode("", prefix_text="武田信玄は、<|inputmask|>時代ファンならぜひ押さえ<|inputmask|>きたい名将の一人。", return_tensors="pt").to(device)
|
54 |
-
>>>
|
55 |
-
>>>
|
|
|
|
|
56 |
"武田信玄は、戦国時代ファンならぜひ押さえておきたい名将の一人。"
|
|
|
|
|
57 |
```
|
58 |
|
59 |
|
|
|
41 |
```
|
42 |
|
43 |
|
44 |
+
## Masked Language Model And Text Generation
|
|
|
45 |
|
46 |
```python
|
47 |
>>> from transformers import AutoModel, AutoTokenizer, trainer_utils
|
|
|
50 |
>>> model = AutoModel.from_pretrained("Tanrei/GPTSAN-japanese").to(device)
|
51 |
>>> tokenizer = AutoTokenizer.from_pretrained("Tanrei/GPTSAN-japanese")
|
52 |
>>> x_token = tokenizer.encode("", prefix_text="武田信玄は、<|inputmask|>時代ファンならぜひ押さえ<|inputmask|>きたい名将の一人。", return_tensors="pt").to(device)
|
53 |
+
>>> trainer_utils.set_seed(30)
|
54 |
+
>>> out_lm_token = model.generate(x_token, max_new_tokens=50)
|
55 |
+
>>> out_mlm_token = model(x_token)[0].argmax(axis=-1)
|
56 |
+
>>> tokenizer.decode(out_mlm_token[0])
|
57 |
"武田信玄は、戦国時代ファンならぜひ押さえておきたい名将の一人。"
|
58 |
+
>>> tokenizer.decode(out_lm_token[0][x_token.shape[1]:])
|
59 |
+
"武田氏の三代に渡った武田家のひとり\n甲斐市に住む、日本史上最大の戦国大名。"
|
60 |
```
|
61 |
|
62 |
|