vLLM output gibberish but text-generation-webui is fine

by Light4Bear - opened Feb 4

Feb 4

vLLM args: python -m vllm.entrypoints.openai.api_server --model LoneStriker_Smaug-70B-v0.1-AWQ --quantization awq --max-model-len 4096 --tensor-parallel-size 4 --gpu-memory-utilization 0.9 --disable-custom-all-reduce --max-num-seqs 1 --port 5000 --served-model-name bear --enforce-eager
The output is complete nonsense.
However when loaded in text-generation-webui by AutoAWQ the generation is fine.
Any idea why?

LoneStriker

Owner Feb 5

Only idea is that the version of AutoAWQ is newer in ooba vs. vLLM. You might have to file a bug report for vLLM or ask Casper about the issue in AutoAWQ Github project. He may have to update the vLLM AWQ code.

Light4Bear changed discussion status to closed Feb 8

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment