Game-changer for 4x24GB setups! AWQ request

by hyunw55 - opened Jun 6

Jun 6

If there's a 4-bit quantized version available - this is PERFECT for my 4x24GB setup! MoE architecture at this size is absolutely game-changing!
Huge thanks for this release - would love to see AWQ quantized versions if possible. That would make this model absolutely legendary for multi-GPU inference.
You guys rock! 🔥

djuna

Jun 6

How much context will you have as the model itself still use MHA?

gghfez

Jun 7

1GB per 1k context

shigureui

Jun 7

We need FP8 versions!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment