Any chance of a 3.5bpw?

#1
by smcleod - opened

Howdy Mike,

Thanks for quantising these models, it's really appreciated!

I was just wondering if there's any chance you'd be able to do a 3.5bpw?
I have 1x 3090 and 2x a4000 which gets me a total of 56GB, I figure 3.5bpw would be about as high as I could go with 16-32K~ (4bit) context

Owner

Starting now. You want h6 or h8?

Legend! I think h6 would be fine. Thanks 🙏

Owner

Glad you said that, because h6 is what I decided to upload. Should be uploaded in ~2 hours. Should be accessible here once the upload is complete: https://huggingface.co/MikeRoz/c4ai-command-r-plus-08-2024-3.5bpw-h6-exl2

Thanks, I really appreciate that.

Owner

It's up.

MikeRoz changed discussion status to closed

Fits like a grove with 32K context!

image.png

It's not the fastest, but it works pretty well:

INFO:     Metrics (ID: e00542d82c61496093eea52f00e3b6c0): 603 tokens generated
in 70.06 seconds (Queue: 0.0 s, Process: 0 cached tokens and 1814 new tokens at
299.96 T/s, Generate: 9.42 T/s, Context: 1814 tokens)

Thanks again!

Sign up or log in to comment