Qwen
/

Text Generation
Transformers
Safetensors
qwen3_moe
conversational
fp8

Q/prob scaling factors error with vllm

#4
by inversemod - opened

WARNING 05-03 17:40:35 [kv_cache.py:128] Using Q scale 1.0 and prob scale 1.0 with fp8 attention. This may cause accuracy issues. Please make sure Q/prob scaling factors are available in the fp8 checkpoint.

Sign up or log in to comment