vLLM output gibberish but text-generation-webui is fine
#2
by
Light4Bear
- opened
vLLM args: python -m vllm.entrypoints.openai.api_server --model LoneStriker_Smaug-70B-v0.1-AWQ --quantization awq --max-model-len 4096 --tensor-parallel-size 4 --gpu-memory-utilization 0.9 --disable-custom-all-reduce --max-num-seqs 1 --port 5000 --served-model-name bear --enforce-eager
The output is complete nonsense.
However when loaded in text-generation-webui by AutoAWQ the generation is fine.
Any idea why?
Only idea is that the version of AutoAWQ is newer in ooba vs. vLLM. You might have to file a bug report for vLLM or ask Casper about the issue in AutoAWQ Github project. He may have to update the vLLM AWQ code.
Light4Bear
changed discussion status to
closed