Version Q4 K XL works great with llama.cpp
#3
by
jeffwadsworth
- opened
Inference (tks/s) is excellent as well. Great work. So far, output is on par with the web chat version.
Do you know how to enable/disable thinking?
Do you know how to enable/disable thinking?
I think you can put /nothink in the system prompt or message. But I'm still downloading so I can't test.