VLLM 0.7.2 can start the model normally, but there is no output when simulating a request using Curl, it blocks!

#2
by JZMALi - opened

python -m vllm.entrypoints.openai.api_server
--served-model-name deepseek-r1
--model /root/filesystem/model_r1/DeepSeek-R1-int4-gptq-sym-inc/OPEA/DeepSeek-R1-int4-gptq-sym-inc
--trust-remote-code
--host 0.0.0.0
--port 8096
--max-model-len 32768
--max-num-batched-tokens 32768
--tensor-parallel-size 8
--gpu_memory_utilization 0.9

Open Platform for Enterprise AI org

Sorry, we don't have enough resources to run this model on vLLM. You may seek assistance in their repository. This model follows the standard GPTQ format.

Sign up or log in to comment