Please, hurry up and release the quantized version!
#23
by
zy19898
- opened
Please, hurry up and release the quantized version!
It's already done. On model card page on the right is the link to quantized versions, which automatically tracked on all website.
https://huggingface.co/models?other=base_model:quantized:deepseek-ai/DeepSeek-V3-0324
But beware very large models unusable at low quants, usable minimum is Q5 or Q6, in Q5 the older V3 used 502Gb RAM-VRAM, in Q6 it uses 568Gb RAM-VRAM to even start.
I do not recommend Ollama which is the last one deny to work with model in parts, you will be downloading Q5 in one file a whole week if connection stable. All other llama-cpp launchers work with parts models.