vLLM output gibberish but text-generation-webui is fine

#2
by Light4Bear - opened

vLLM args: python -m vllm.entrypoints.openai.api_server --model LoneStriker_Smaug-70B-v0.1-AWQ --quantization awq --max-model-len 4096 --tensor-parallel-size 4 --gpu-memory-utilization 0.9 --disable-custom-all-reduce --max-num-seqs 1 --port 5000 --served-model-name bear --enforce-eager
The output is complete nonsense.
However when loaded in text-generation-webui by AutoAWQ the generation is fine.
Any idea why?

Only idea is that the version of AutoAWQ is newer in ooba vs. vLLM. You might have to file a bug report for vLLM or ask Casper about the issue in AutoAWQ Github project. He may have to update the vLLM AWQ code.

Light4Bear changed discussion status to closed

Sign up or log in to comment