Request for a 40-60B Parameter Model Compatible with A100 (80GB)
Dear developers,
As the owner of a single A100 (80GB), I would like to kindly request the release of a high-quality language model in the 40-60B parameter range that can run efficiently on this hardware. Ideally, it would be great if the model could be optimized with AWQ quantization (or other efficient methods) to ensure practical inference speeds while maintaining accuracy.
Priority engine: VLLM
Such a model would be incredibly valuable for researchers and developers with similar hardware constraints. Thank you for considering this suggestion!
Best regards,
Potanin Marat
if alone , one should try ktransforemer .
Best wait for llama4, Deepseek love the MoE arch so they can't really have a model of that size without going dense
A few (relatively affordable) options for 128GB inference hardware are coming as well (DGX Spark and Ryzen AI 395). ~80GB might be a good target, or ~80GB in Q4_K_M.
better wait for the distills
why not use qwq 32B? It is way overpowered for it's size
有的兄弟,有的。
why not use qwq 32B? It is way overpowered for it's size
I would like an LLM for a larger number of parameters, about 50B