Update chat template to resemble the prompt as stated in the model card.
While switching backends, we encountered quite severe downgrading in our Mixtral model generation results.
Diving deeper into this issues, we found that the tokenizer relied on the HF model config and used the chat_tempate
from there aswell.
For the following input:
messages = [
{"role": "user", "content":"Hello, how are you?"},
{"role": "assistant", "content":"Good, how are you?"},
{"role": "user", "content":"Very good!"}
]
converstion_string = tokenizer.apply_chat_template(
message,
tokenize=False,
add_generation_prompt=True,
)
print(converstion_string) -> "<s>[INST] Hello, how are you? [/INST]Good, how are you?</s>[INST] Very good! [/INST]"
The model card states that this should be
<s> [INST] Hello, how are you? [/INST] Good, how are you?</s> [INST] Very good! [/INST]
Although the difference is very limited (only 2 spaces for a 3 message conversation, the difference in generation results is very big.
With the current implementation we saw very often that the model predicted the <eos>
token in unexpected places, especailly since we are generating structured output and the returned structure was therefore not valid. This change has overcome this issue. Furthermore, the generated output is over better quality in terms of expected structured output.
Related to the following issue: https://github.com/vllm-project/vllm/issues/2464
Just wanted to say that this seems to have fixed a lot of the issues I was having with my code.
Why is there no comment from the official maintainers, given the issue is so fundamental?
Also, why has the fix not merged yet?
Commenting for visibility.
Did anybody check whether the recent commit https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/commit/bbae113847402a22031211225b5ee45c005de7dd fixed this?
Did anybody check whether the recent commit https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/commit/bbae113847402a22031211225b5ee45c005de7dd fixed this?
The new template should be more accurate yes 🔥
Still noticing <eos>
token being predicted in unexpected places. Any changes that need to be made to this? <s> [INST] Hello, how are you? [/INST]