Inconsistent prompt format. Which is correct the Model card or the tokenizer_config.json?
The Model card says it is important to get the prompt template correct or else the model will produce sub-optimal outputs, but which prompt template is correct? Two different ones have been given.
The one from the model card:
<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]
The one from tokenizer_config.json:
<s>[INST] Instruction [/INST]Model answer</s>[INST] Follow-up instruction [/INST]
Also, are we supposed to leave every EOS token in for each bot response in the conversation? That's what the pseudocode of the model card and the prompt template in tokenizer_config.json imply but I haven't seen that done before.
Could someone from the team like @pstock or @timlacroix clear this up? Appreciate it.
The Model card says it is important to get the prompt template correct or else the model will produce sub-optimal outputs, but which prompt template is correct? Two different ones have been given.
The one from the model card:
<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]
The one from tokenizer_config.json:
<s>[INST] Instruction [/INST]Model answer</s>[INST] Follow-up instruction [/INST]
I've been wondering this too, and after quite a bit of experimentation I'm fairly sure it's the first prompt format and think it's just a typo. The Mistral instruct uses the first prompt format too:
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
I've also found it's important to NOT leave a space after the closing [/INST] , as even though it irritatingly adds a space itself when replying, it can cause the output to come up with Chinese Unicode characters sometimes (deepseek-llm also does this is you add the space!), but for codellama it's VERY important to add the space (and I think for llama2 it should be there, although it doesn't seem to matter so much).
@jukofyork I think the tokenizer_config.json prompt is more than just a typo, as for the rest, I've found the same.
I've tested the tokenizer_config.json prompt extensively, and I think the reason it removes spaces is because, like you mention, the model automatically adds a space between [/INST]
and Model answer
and it will cause problems if you add the space manually. I think the same is true for the space between <s>
and [INST]
where the model will automatically add it in, however, the model will not add a space between </s>
and [INST]
so I'm not sure why that space was removed, and I still don't know if we should leave the EOS token in or out of every bot response.
I was having a lot of trouble with the prompt templates and tried my best to find the correct ones for as many of the original/official models as possible:
https://github.com/jmorganca/ollama/issues/1977
Just one tiny mistake can make a huge difference for some of the models and it's probably not all that obvious unless you try to get them to ingest a very large amount of sourcecode or similar.
codellama is the worst effected by this and I do wonder if a lot of the benchmarks people have run were with the wrong template - it's actually a lot better than people realise if you use the correct template.
I'm also not sure about the need to use the <s>
type tags, but I followed the official instructions to a letter (eg: llama2 and codellama say to add them, Mixtral and Minstral say to only add them to the first message, etc).
If there's one thing we desperately need, it's some standard prompt template for future models!
I spent all afternoon running different experiments and am actually shocked at how much finding the proper prompt has improved all 3 models:
It's made Mistral about as good as the other 2 were before, and the other 2 are now MUCH better; with all the weirdness (ie: where they claimed to make changes to code when they didn't etc) gone now.
I've marked the spaces with '■' so they stand out, but you will need to change them. Also remember if you aren't using Ollama or llama.cpp you might need to add back the <s>
prefix:
Mistral
and Miqu
:
TEMPLATE """{{ if and .First .System }}[INST]■{{ .System }}
Please await further instructions and simply respond with 'Understood'.■[/INST]
Understood</s>■
{{ end }}[INST]■{{ .Prompt }}■[/INST]
{{ .Response }}"""
This agrees with the example on the Mistral page:
text = "<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
Mixtral
:
TEMPLATE """{{ if and .First .System }}■[INST]■{{ .System }}
Please await further instructions and simply respond with 'Understood'.■[/INST]■
Understood</s>
{{ end }}■[INST]■{{ .Prompt }}■[/INST]■
{{ .Response }}"""
This sort of agrees with the example on the Mixtral page:
<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]
https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
But it seems using the newlines before the response like the Mistral example is essential.
Can someone provide an example of how we can add chain of thought and multiple examples in instruction.