"<extra_0>" is not special token ? I got 5 token_ids ,is it right?
#4
by
ShelterW
- opened
tokenizer.encode("<extra_0>")
[27,15460,62,15,29]
tokenizer.decode([34139]) = ']<'
tokenizer.decode([27]) = '<'
is bug
suggest use
"\n<extra_0>"
or
".<extra_0>"
Which tokenizer are you using? In the tokenizer config of this file, '<extra_0>' is a special token.
"151651": {
"content": "<extra_0>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
yeah, my mistake
ShelterW
changed discussion status to
closed
suggest use
"\n<extra_0>"
or
``