Request for a 40-60B Parameter Model Compatible with A100 (80GB)

by potanin-marat - opened Mar 24

Mar 24

•

Dear developers,

As the owner of a single A100 (80GB), I would like to kindly request the release of a high-quality language model in the 40-60B parameter range that can run efficiently on this hardware. Ideally, it would be great if the model could be optimized with AWQ quantization (or other efficient methods) to ensure practical inference speeds while maintaining accuracy.
Priority engine: VLLM
Such a model would be incredibly valuable for researchers and developers with similar hardware constraints. Thank you for considering this suggestion!

Best regards,
Potanin Marat

lstoonee

Mar 24

if alone , one should try ktransforemer .

1TuanPham

Mar 24

Best wait for llama4, Deepseek love the MoE arch so they can't really have a model of that size without going dense

panopstor

Mar 24

A few (relatively affordable) options for 128GB inference hardware are coming as well (DGX Spark and Ryzen AI 395). ~80GB might be a good target, or ~80GB in Q4_K_M.