FP8 Quants please

by rjmehta - opened 4 days ago

4 days ago

Amazing model. Thank you. FP8 Quants please.

3 days ago

Download, serve with -q fp8 with vllm.

3 days ago

Warning
Currently, we load the model at original precision before quantizing down to 8-bits, so you need enough memory to load the whole model.

Does it require 2x H100 to load the model before it is quantized to fp8?

soumye

NVIDIA org about 14 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment