Alternative quantizatioons.

by ZeroWw - opened Jun 28, 2024

ZeroWw

Jun 28, 2024

My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5_k or q6_k.

Result: both f16.q6 and f16.q5 are smaller than q8_0 standard quantization and they perform as well as the pure f16.

ZeroWw

Jun 29, 2024

And, mistral instruct v03?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment