How to deploy model?

#1
by liangxiaofeng - opened

Can you provide a tutorial on inference deployment?
I failed to deploy using vllm and sglang.

docker run -itd --name glm46 -p 28001:8000 --shm-size=300g --gpus all -v /models/GLM-4.6-Channel-int8:/models --entrypoint "bash" docker.1ms.run/vllm/vllm-openai:v0.11.0 -c " python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 --max-model-len 202752 --trust-remote-code --tensor-parallel-size 8 --served-model-name glm46 --model /models --enable-auto-tool-choice --tool-call-parser glm45 --reasoning-parser glm45"

docker run -itd --name glm46 -p 28000:8000 --shm-size=300g --gpus all -v /models/GLM-4.6-Channel-int8:/models docker.1ms.run/lmsysorg/sglang:latest python3 -m sglang.launch_server --model-path /models --host 0.0.0.0 --port 8000 --context-length 202752 --trust-remote-code --tp 8 --served-model-name glm46 --tool-call-parser glm45 --mem-fraction-static 0.7 --load-format layered

same question

Sign up or log in to comment