What quant is best?

by jukofyork - opened 13 days ago

Owner 13 days ago

I'd be interested in hearing what people's findings are on the quant that seems to work best for them?

For CUDA it would seem to be IQ4_XS that gives the highest throughput and least latency, but I wonder if a higher bitrate quant with slightly more latency and a higher accuracy might help the dynamic --draft-min-p option in llama.cpp!?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment