VLLM 0.7.2 can start the model normally, but there is no output when simulating a request using Curl, it blocks!

by JZMALi - opened Feb 18

Feb 18

python -m vllm.entrypoints.openai.api_server
--served-model-name deepseek-r1
--model /root/filesystem/model_r1/DeepSeek-R1-int4-gptq-sym-inc/OPEA/DeepSeek-R1-int4-gptq-sym-inc
--trust-remote-code
--host 0.0.0.0
--port 8096
--max-model-len 32768
--max-num-batched-tokens 32768
--tensor-parallel-size 8
--gpu_memory_utilization 0.9

cicdatopea

Open Platform for Enterprise AI org Feb 18

Sorry, we don't have enough resources to run this model on vLLM. You may seek assistance in their repository. This model follows the standard GPTQ format.

lxylxy

Apr 6

I also encountered this problem, is there any solution yet? I have opened an issue at https://github.com/vllm-project/vllm/issues/16111

cicdatopea

Open Platform for Enterprise AI org Apr 7

You can try this model: https://huggingface.co/OPEA/DeepSeek-R1-int4-AutoRound-awq-asym. Due to limited resources, we only tested the AWQ version. It appears that vLLM currently also doesn't support AWQ with symmetric quantization.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment