vllm docker
#4
by
rcalv002
- opened
Hiya,
my stack uses vllm in docker. using huggingface cli I've downloaded the mistralai_Mistral-Small-3.1-24B-Instruct-2503-Q4_K_M.gguf into my standard huggingface hub cache, this is mounted to my vllm container (as with all the other models). When i try to run it, it fails. Are we missing some files in this repo?
vllm:
image: vllm/vllm-openai:latest
runtime: nvidia
command:
- --model
- bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF
- --max_model_len
- "8192"
- --tokenizer_mode
- mistral
- --config_format
- mistral
- --load_format
- mistral
- --tool-call-parser
- mistral
- --enable-auto-tool-choice
- --host
- 0.0.0.0
- --port
- "8000"
ports:
- "8000:8000"
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
- HF_TOKEN=${HF_TOKEN}
- VLLM_API_KEY=${VLLM_API_KEY}
volumes:
- ~/.cache/huggingface:/root/.cache/huggingface
I'm having the same issue. Did you find any solutions to this?
it might be missing the type of the model? it's also possible vllm won't work at all, i know their GGUF support is still relatively beta-level
any reason for needing VLLM specifically?