"<extra_0>" is not special token ? I got 5 token_ids ,is it right?

#4
by ShelterW - opened
tokenizer.encode("<extra_0>")
[27,15460,62,15,29]
tokenizer.decode([34139]) =  ']<'
tokenizer.decode([27]) =  '<'

is bug

suggest use

 "\n<extra_0>"

or

".<extra_0>"

Which tokenizer are you using? In the tokenizer config of this file, '<extra_0>' is a special token.

"151651": {
      "content": "<extra_0>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    }

yeah, my mistake

ShelterW changed discussion status to closed

suggest use

 "\n<extra_0>"

or
``

Sign up or log in to comment