Tom Aarsen
commited on
Commit
·
51c9b61
1
Parent(s):
6a07d13
Update tokenizer: Set lstrip=True for [MASK]
Browse filesThis allows mask filling with e.g. "The dog [MASK]", whereas that otherwise didn't work as most tokens start with a space in our tokenizer. The model doesn't use double spaces, so it would give odd results like "urn".
- tokenizer_config.json +1 -1
tokenizer_config.json
CHANGED
|
@@ -258,7 +258,7 @@
|
|
| 258 |
},
|
| 259 |
"50284": {
|
| 260 |
"content": "[MASK]",
|
| 261 |
-
"lstrip":
|
| 262 |
"normalized": false,
|
| 263 |
"rstrip": false,
|
| 264 |
"single_word": false,
|
|
|
|
| 258 |
},
|
| 259 |
"50284": {
|
| 260 |
"content": "[MASK]",
|
| 261 |
+
"lstrip": true,
|
| 262 |
"normalized": false,
|
| 263 |
"rstrip": false,
|
| 264 |
"single_word": false,
|