VLLM 0.7.2 can start the model normally, but there is no output when simulating a request using Curl, it blocks!
#2
by
JZMALi
- opened
python -m vllm.entrypoints.openai.api_server
--served-model-name deepseek-r1
--model /root/filesystem/model_r1/DeepSeek-R1-int4-gptq-sym-inc/OPEA/DeepSeek-R1-int4-gptq-sym-inc
--trust-remote-code
--host 0.0.0.0
--port 8096
--max-model-len 32768
--max-num-batched-tokens 32768
--tensor-parallel-size 8
--gpu_memory_utilization 0.9
Sorry, we don't have enough resources to run this model on vLLM. You may seek assistance in their repository. This model follows the standard GPTQ format.