"<extra_0>" is not special token ? I got 5 token_ids ，is it right？

by ShelterW - opened Jan 15

Jan 15

•

tokenizer.encode("<extra_0>")

[27,15460,62,15,29]

ShelterW

Jan 15

•

edited Jan 15

tokenizer.decode([34139]) =  ']<'
tokenizer.decode([27]) =  '<'

is bug

ShelterW

Jan 15

•

edited Jan 15

suggest use

 "\n<extra_0>"

".<extra_0>"

Zhenru

Qwen org Jan 16

•

edited Jan 16

Which tokenizer are you using? In the tokenizer config of this file, '<extra_0>' is a special token.

"151651": {
      "content": "<extra_0>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    }

ShelterW

Jan 16

yeah, my mistake

ShelterW changed discussion status to closed Jan 16

Ashmc

Jan 17

suggest use
 "\n<extra_0>"
or
``

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment