Will quantised version be available?

by angerhang - opened Oct 17, 2024

Oct 17, 2024

Thanks for sharing but what are the recommended ways to quantise this model?
Or will quantised model be made available so that it is not as resource-intensive to do inference?

Thanks

victor

Oct 17, 2024

Did you see https://huggingface.co/models?other=base_model:quantized:nvidia/Llama-3.1-Nemotron-70B-Instruct-HF?
Use the model tree section on model pages to see what quantizations are available.

okuchaiev

NVIDIA org Oct 18, 2024

NVIDIA hasn't released any quantized version yet. But there are several community quantization efforts mentioned above.

yangwang92

Oct 22, 2024

we also provide quantized 4-1.5 bits version https://github.com/microsoft/VPTQ at here https://huggingface.co/collections/VPTQ-community/vptq-llama-31-nemotron-70b-instruct-hf-without-finetune-671730b96f16208d0b3fe942 . Feel free give us feedback!

mysticbeing

Nov 6, 2024

•

edited Nov 6, 2024

Runs on 1x H100 / A100 (80GB) : https://huggingface.co/mysticbeing/Llama-3.1-Nemotron-70B-Instruct-HF-FP8-DYNAMIC

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment