Quant Error

#2
by abiteddie - opened

VLLM_USE_V1=0 VLLM_USE_TRITON_FLASH_ATTN=1 vllm serve '/mnt/models/vllm_models/Qwen3-235B-A22B-Thinking-2507-AWQ/' --dtype float16 --tensor-parallel-size 4 --max-model-len 32768 --pipeline-parallel-size 2 --enable-expert-parallel

But it failed with AttributeError: '_OpNamespace' '_C' object has no attribute 'awq_marlin_repack'

Sign up or log in to comment