Tensor parallel size

#1
by Mitke15 - opened

Thank you for providing the AWQ version. Sadly I can't manage to get it running on 2 GPUs due to:

ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.

Official qwen repo had the same issue with a few of their models, but managed to fix it somehow:

https://github.com/vllm-project/vllm/issues/2699#issuecomment-1956139276

It'd be great if you could help us out here.
Thanks!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment