bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF

Mar 24

Hiya,

my stack uses vllm in docker. using huggingface cli I've downloaded the mistralai_Mistral-Small-3.1-24B-Instruct-2503-Q4_K_M.gguf into my standard huggingface hub cache, this is mounted to my vllm container (as with all the other models). When i try to run it, it fails. Are we missing some files in this repo?

  vllm:
    image: vllm/vllm-openai:latest
    runtime: nvidia
    command:
      - --model
      - bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF
      - --max_model_len
      - "8192"
      - --tokenizer_mode
      - mistral
      - --config_format
      - mistral
      - --load_format
      - mistral
      - --tool-call-parser
      - mistral
      - --enable-auto-tool-choice 
      - --host
      - 0.0.0.0
      - --port
      - "8000"
    ports:
      - "8000:8000"
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
      - HF_TOKEN=${HF_TOKEN}
      - VLLM_API_KEY=${VLLM_API_KEY}
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface

jophisbird

Mar 26

I'm having the same issue. Did you find any solutions to this?

bartowski

Owner Mar 26

it might be missing the type of the model? it's also possible vllm won't work at all, i know their GGUF support is still relatively beta-level

any reason for needing VLLM specifically?

bartowski
/

mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF

vllm docker