Tokenizer difference between deepseek and qwen3

#227
by yangsketch - opened

hi,

The tokenizer of deepseek is different from qwen2.5? When you use deepseek r1 to distill qwen 2.5, how to align the two tokenizers? Could you describe the details? Thank you!!!

Sign up or log in to comment