Why vocab size is 32032? (Extra 30 tokens)
#3
by
theodotus
- opened
Is it a typo?
Should be 32002
theodotus
changed discussion title from
Why vocab size is 32032
to Why vocab size is 32032? (Extra 30 tokens)
We use ChatML format, so we have to add 2 extra tokens to the tokenizer:
https://huggingface.co/wandb/mistral-7b-zephyr-dpo/blob/main/tokenizer_config.json
tcapelle
changed discussion status to
closed