It doesn't work with llama.cpp

#2
by JLouisBiz - opened

With llama.cpp it seems like it only knows its prompt, it doesn't get user's prompt.

Knowledge Engineer Group @ Tsinghua University org

@bys0318 I tried bartowski's IQ4_XS quant of this model in LM Studio, and asked it to write a story. It seems to produce its initial thought process outside of a thinking block, then produces a thinking block which contains further thinking, before going on to write the story. This behavior is rather weird, because I would expect it to put all its thoughts inside the thinking block. Also, I needed to add a stop string, or else it would not stop. Is there something wrong with the chat template, or is it llama.cpp / LM Studio's implementation of the chat template?

Knowledge Engineer Group @ Tsinghua University org

Hi! We haven't tested the usage with llama.cpp😢, so it's possible that there may indeed be some misalignment. You might want to try referring to the usage information provided on our model card page, particularly the Def format_prompt_with_template(prompt) function and stop_strings, especially "</answer>" . Aligning with these may help resolve the issue.

Sign up or log in to comment