Game-changer for 4x24GB setups! AWQ request

#1
by hyunw55 - opened

If there's a 4-bit quantized version available - this is PERFECT for my 4x24GB setup! MoE architecture at this size is absolutely game-changing!
Huge thanks for this release - would love to see AWQ quantized versions if possible. That would make this model absolutely legendary for multi-GPU inference.
You guys rock! πŸ”₯

How much context will you have as the model itself still use MHA?

1GB per 1k context

We need FP8 versions!

Sign up or log in to comment