Tensor parallel size
#1
by
Mitke15
- opened
Thank you for providing the AWQ version. Sadly I can't manage to get it running on 2 GPUs due to:
ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.
Official qwen repo had the same issue with a few of their models, but managed to fix it somehow:
https://github.com/vllm-project/vllm/issues/2699#issuecomment-1956139276
It'd be great if you could help us out here.
Thanks!