Quant Error
#2
by
abiteddie
- opened
VLLM_USE_V1=0 VLLM_USE_TRITON_FLASH_ATTN=1 vllm serve '/mnt/models/vllm_models/Qwen3-235B-A22B-Thinking-2507-AWQ/' --dtype float16 --tensor-parallel-size 4 --max-model-len 32768 --pipeline-parallel-size 2 --enable-expert-parallel
But it failed with AttributeError: '_OpNamespace' '_C' object has no attribute 'awq_marlin_repack'