Broken prompt?
This is pretty cool. However it appears the Jinja template is broken. I used a fallback.
It seems the IQ1 quant is too damaged to do anything at all.
The IQ2_S is able to actually reason and create coherent short stories. It is fully able to manage to take the character in crazy system prompt and answer questions in character. There is some trash in the output from what looks like minor token corruption If I enable all the experts, it reverts to junk. However with experts = 8 & top k = 8 it even retains alignment, reasons if something is against its alignment then refuses to answer obviously illegal things! This is really cool.
Quantizing the KV cache on IQ2_S to q4_0 it can't manage to think or escape thinking. q8 it can think but often fails to escape or reply. f16 seems reliable.
Thanks for the feedback. When I get some time I will take a look at the chat template and compare it to the original model. Maybe it was changed when the author created this model from the original Qwen model.
I can see that the original model files do not include a chat template so these ggufs also do not include the template. I have added a requantization to the queue using base models chat_template.