Game-changer for 4x24GB setups! AWQ request
#1
by
hyunw55
- opened
If there's a 4-bit quantized version available - this is PERFECT for my 4x24GB setup! MoE architecture at this size is absolutely game-changing!
Huge thanks for this release - would love to see AWQ quantized versions if possible. That would make this model absolutely legendary for multi-GPU inference.
You guys rock! π₯
How much context will you have as the model itself still use MHA?
1GB per 1k context
We need FP8 versions!