Space after [/INST]
I saw that you had to add a space to fix the model, is that just one space in total needed after [/INST] and response? I'm trying to start fine tuning, using llama 2 chat also, and sometimes get [/INST] in generation (with just one space in tuning).
I actually had one too many spaces, the very last [/INST] should not have a space. All intermediate instances should however.
See this function and format_chat_airoboros below it:
https://github.com/jondurbin/qlora/blob/8cd269bf9bd7753c92164934269019e12f23314f/train.py#L551
In the earlier version, I accidentally had one extra space on line 563.
I'm not sure why you'd see those in generation though, maybe an issue with missing eos tokens? Could just be your dataset length exceeds model max length so it's not getting to an eos or something, hard to say.
Thanks for the clarification. I have add_eos_token=True as a setting, and removed any tokens in the dataset. I could have some records exceeding context, I will have to check (they are foreign language translations). I loaded up your dataset to learn more, using functions in the script you reference. I see you use llama 2 chat format on input feature, then output feature is just the response (after formatting function). I tried to load into my own SFTTrainer code, but I think it's not set up for features other than the default 'text'. Also not familiar with the Seq2SeqTrainer you use-do you recommend over SFTTrainer? Thanks again!
I have been tweaking from guide here: https://mlabonne.github.io/blog/posts/Fine_Tune_Your_Own_Llama_2_Model_in_a_Colab_Notebook.html, but not sure if I can use your dataset with this in SFTTrainer.
@jondurbin When training, do you then include that space before the trained AI response (that is, is the AI expected to output a space as the first output token)? Otherwise, I'm confused why this makes sense; after all, there's a space after [/INST] before every other AI response in the history!
So is the lack of a space after [/INST] for the last query just a thing for inference, and for masking loss during training, but the space is still there for the last response while training?
Egs., we would train on:"...[/INST] Some response </s>"
But (assuming we are not training on inputs), the "...[/INST]"
is masked out for loss computation, and the trained completion is " Some response </s>"
, including the space at the start (and before </s>
), correct?
After some experimentation, I am getting blank responses in Oobabooga with a small probability when I train without the space for the last completion, vs adding the space, especially on completions with very long contexts. This problem doesn't seem to be an issue if I include the extra space while training (so opposite of jondurbin's statement?).
This difference is removed by stripping the space in Ooba, but I believe Ooba does not do that by default. In other words, ooba ends its completion request with:...[/INST]<extra space here>
so I think we should do the same while training.
Would be happy to learn if others have figured it out and can explain it. There's a lot of contradictory info out there.
@grimulkan Trust me, I am confused about the situation and completely agree with you that intuitively, it would be better to train with a single space, but it caused problems when using the default llama-2 chat template in inference.
The official repo (llama-recipes) from meta has this:
https://github.com/facebookresearch/llama-recipes/blob/cecad84841d669413a61b68b95c22c8d33a98f03/src/llama_recipes/inference/chat_utils.py#L56
In their chat utils, they format the inference prompt with no space after the last [/INST]
Unfortunately, their repo doesn't have a complete example for fine-tuning, but they have a custom dataset example, which also has no space:
https://github.com/facebookresearch/llama-recipes/blob/cecad84841d669413a61b68b95c22c8d33a98f03/examples/custom_dataset.py#L14
I really don't know.
Maybe that's the explanation. Meta used their inference code (in chat_utils.py) which didn't include the space, and that matches with how they trained it. But Oobabooga (and probably other clients) that we use, do add the space I think. So technically, that means what we're calling Llama-chat format is not really Llama-chat in some clients. I am not even sure the Ooba template structure allows for a non-space there (unless they forcibly strip() the input each time).