Tokenizer config seems broken
#4
by
Barahlush
- opened
There is a problem when using the model on Windows machine. Chat template is read in wrong encoding, which breaks special tokens, e.g. <|Assistant|>
turns into <пЅњAssistantпЅњ>
.
As result, when downloading and using the tokenizer with transformers/unsloth, the chat template appends these broken sequences instead of correct ones and the result is not tokenized correctly (e.g. "<пЅњAssistantпЅњ>" is split into ~6 tokens instead of 1)
Barahlush
changed discussion status to
closed
Barahlush
changed discussion status to
open