Can you provide a FP8 version?

#11
by xjpang85 - opened

Can you provide a FP8 version for less GPUs.

MiniMax org

Can the int8 or int4 version meet the requirements?

int4 ok

MiniMax org

We have submitted a 13454 to vllm, and the inference performance improvement compared to the Hugging Face's implementation is very significant. You can give it a try.

Sign up or log in to comment