
Undi95/QwQ-RP-LoRA
Updated
•
13
•
2
That's what some of my dataset do, but then you're still stuck with one reply trained, not an entire conversation.
I break my head around that haha
Edit: I missread,if you add multiple in the context, the model is confused because they are trimmed out of the context by the chat template to not waste token we don't need anymore.
So we can't train it like this either, because the bot will have multiple thinking process in the conversation.
You could do that but in that case the bot will not use <think>
because it's not trained on all of the reply to do it.
What I would ideally want is a model that apply the thinking itself without system prompt or prefilling