3.5bpw quant request

#1
by OrangeApples - opened

Hi @zaq-hack ! Thanks for the uploads. Could I request for you to upload a 3.5bpw exl2 quant of this? Would like to try this model with as high bpw while still having decent context (~16k) on my 24GB card.

Have you tried this one? https://huggingface.co/zaq-hack/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-bpw364-h6-exl2 You should be able to get 12-16k context on it.

I didn't find the rpcal made a significant difference. It seems like it should, and maybe there are higher peaks of better prose, but it's very subjective. I couldn't find any objective measurement where it did better.

If that's still a touch too big, I'll consider a 3.5.

(I own a 3090 and a 3060, and so my "fits in a 3090" meter sometimes spills over a tiny, tiny bit and I don't notice.)

The quality of outputs produced by this bpw300 and that bpw364 seems to be large in my testing, i almost consistently prefer this rpcal model over the 364.

@zaq-hack thanks! The rpcal version is what I was after since I wanted to compare it to the base model which I'm familiar with. I'll give this 3bpw quant a shot though, especially after reading that @tentx had to say about it. Cool setup, by the way! The more 3090 + 3060 setups I read about, the more I regret selling my 3060 Ti hehe.

3.5 in progress ...

Sign up or log in to comment