FP8 Quants please

#3
by rjmehta - opened

Amazing model. Thank you. FP8 Quants please.

Download, serve with -q fp8 with vllm.

Warning
Currently, we load the model at original precision before quantizing down to 8-bits, so you need enough memory to load the whole model.

Does it require 2x H100 to load the model before it is quantized to fp8?

Sign up or log in to comment