Version Q4 K XL works great with llama.cpp

#3
by jeffwadsworth - opened

Inference (tks/s) is excellent as well. Great work. So far, output is on par with the web chat version.

Do you know how to enable/disable thinking?

Do you know how to enable/disable thinking?

I think you can put /nothink in the system prompt or message. But I'm still downloading so I can't test.

Sign up or log in to comment