--leave-output-tensor !
#13
by
ZeroWw
- opened
all quants should also have an alternate version quantized using --leave-output-tensor
so we can see if that 30% bigger file has better performances...
@ZeroWw would you like those for this model or any other specific model, and any specific sizes? Will try to include in future models
I made some tests,,, the model as of now that resists better to quantization is Mistral-7b-Instruct-v0.2
With llama-3-8b I am having horrible results even at q8_0.
Thanks for the offer though.