vllm docker

#4
by rcalv002 - opened

Hiya,

my stack uses vllm in docker. using huggingface cli I've downloaded the mistralai_Mistral-Small-3.1-24B-Instruct-2503-Q4_K_M.gguf into my standard huggingface hub cache, this is mounted to my vllm container (as with all the other models). When i try to run it, it fails. Are we missing some files in this repo?

  vllm:
    image: vllm/vllm-openai:latest
    runtime: nvidia
    command:
      - --model
      - bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF
      - --max_model_len
      - "8192"
      - --tokenizer_mode
      - mistral
      - --config_format
      - mistral
      - --load_format
      - mistral
      - --tool-call-parser
      - mistral
      - --enable-auto-tool-choice 
      - --host
      - 0.0.0.0
      - --port
      - "8000"
    ports:
      - "8000:8000"
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
      - HF_TOKEN=${HF_TOKEN}
      - VLLM_API_KEY=${VLLM_API_KEY}
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface

I'm having the same issue. Did you find any solutions to this?

it might be missing the type of the model? it's also possible vllm won't work at all, i know their GGUF support is still relatively beta-level

any reason for needing VLLM specifically?

Sign up or log in to comment