mistralai/Mistral-Small-3.1-24B-Instruct-2503

bartowski

Mar 17

Is HF format on the way? was done by @cyrilvallez last time so pinging here :)

TheDrummer

Mar 17

still uploading... french internet...

TheDrummer

Mar 17

i need the config.json!

Yuuru

Mar 17

Yay, new Cydonia incoming

rdodev

Mar 17

still uploading... french internet...

Any updated on this? vllm crashes when trying to load this .

MrDevolver

Mar 17

i need the config.json!

Don't we all? 😄

Just someone please blow the dust off of the Mistral CD and let it play again...

rdodev

Mar 17

Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.

MrDevolver

Mar 17

Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.

Plot twist:
It was meant to be a private repo for internal use, published by accident. 🤣

Mdubbya

Mar 17

Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.

Plot twist:
It was meant to be a private repo for internal use, published by accident. 🤣

oh this makes a lot more sense lol still very appreciated!

ArtusDev

Mar 17

Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.

For @rdodev and everyone else. You can refer to the model card for vLLM instructions.
You need the nightly build of vLLM to serve Mistral 3.1 right now.
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2

Mdubbya

Mar 17

@ArtusDev yes, I saw that however I'm gpu poor (48gb VRAM) and need the config files to quantize this puppy :-)

rdodev

Mar 17

Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.

For @rdodev and everyone else. You can refer to the model card for vLLM instructions.
You need the nightly build of vLLM to serve Mistral 3.1 right now.
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2

I followed their directions in the model card. Nighlty vllm is installed and configured. Still aborts when trying to load the model from HF because it's lacking necessary files.

MrDevolver

Mar 17

•

edited Mar 17

Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.

For @rdodev and everyone else. You can refer to the model card for vLLM instructions.
You need the nightly build of vLLM to serve Mistral 3.1 right now.
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2

What about literally everyone else who can't use vllm and instead need the model to be converted to GGUF or something like that? Such conversions require more files than what was supplied.

ArtusDev

Mar 17

@rdodev Weird... I've successfully served 3.1 through nightly vLLM without issues. Maybe check that mistral_common >= 1.5.4 is getting installed as well?

ArtusDev

Mar 17

@MrDevolver There seems to be a HF compatible repo of MS 3.1 available https://huggingface.co/anthracite-core/Mistral-Small-3.1-24B-Instruct-2503-HF/

cyrilvallez

Mar 17

Hey all! HF format will be available tomorrow, along with a Transformers release for it 🤗

MrDevolver

Mar 17

@MrDevolver There seems to be a HF compatible repo of MS 3.1 available https://huggingface.co/anthracite-core/Mistral-Small-3.1-24B-Instruct-2503-HF/

Looks more like a hack and there's no chat template. Official repo would be nice.

ivanfioravanti

Mar 17

Is this official repo? Where if config.json?

rdsm

Mar 17

@ivanfioravanti they use their own format. we are waiting for some one to convert ,apparently a user called anthracite-core hacked their way.

Pikasso

Mar 17

•

edited Mar 17

Hey all! HF format will be available tomorrow, along with a Transformers release for it 🤗

MrDevolver

Mar 17

Hey all! HF format will be available tomorrow, along with a Transformers release for it 🤗

K, unliking the model until it's actually useable.

Mdubbya

Mar 17

Hey all! HF format will be available tomorrow, along with a Transformers release for it 🤗

Schweeet!! Aww cute

rdsm

Mar 17

for people who are too anxious to wait for tomorrow, there is a conversion script here https://huggingface.co/anthracite-core/Mistral-Small-3.1-24B-Instruct-2503-HF/discussions/1#67d8a8d541d31cc626cded1d

Thanks to @mrfakename

rdsm

Mar 17

I was able to run a text-only version using the above script. The sha256 of the local safetensors match the files at anthracite-core (at least 0001 and 0010 that I checked). I made a MLX 4bit quant and everything seems to be working jsut fine.

qwp4w3hyb

Mar 18

•

edited Mar 18

for people who are too anxious to wait for tomorrow, there is a conversion script here https://huggingface.co/anthracite-core/Mistral-Small-3.1-24B-Instruct-2503-HF/discussions/1#67d8a8d541d31cc626cded1d

Thanks to @mrfakename

For others who are anxious, I'm (slowly) uploading some imatrix GGUF's at https://huggingface.co/qwp4w3hyb/Mistral-Small-3.1-24B-Instruct-2503-HF-iMat-GGUF

First one should be there in ~ 35min

bartowski

Mar 18

I'll probably wait for the official upload but good to see some people have working conversions going up :)

MrDevolver

Mar 18

For others who are anxious, I'm (slowly) uploading some imatrix GGUF's at https://huggingface.co/qwp4w3hyb/Mistral-Small-3.1-24B-Instruct-2503-HF-iMat-GGUF

First one should be there in ~ 35min

I bet the first one is the one that's one level bigger than what my PC can handle! ~Random Anxious Guy

qwp4w3hyb

Mar 18

•

edited Mar 18

I bet the first one is the one that's one level bigger than what my PC can handle! ~Random Anxious Guy

Order in the script is: IQ4_XS Q4_K_M Q5_K_M Q6_K IQ4_NL IQ2_S IQ2_XS IQ2_XXS IQ3_S IQ3_XS IQ3_XXS Q4_K_S Q5_K_S Q8_0 Q4_0 IQ2_M IQ3_M IQ1_S bf16.

MrDevolver

Mar 18

I'll probably wait for the official upload but good to see some people have working conversions going up :)

Sure, in the meantime keep ~~nagging~~ reminding them to upload it.

#ReleaseMistralNow

mrfakename

Mar 18

Some quants should be live now for the Instruct model (sorry, no imatrix):
https://huggingface.co/mrfakename/mistral-small-3.1-24b-instruct-2503-gguf

DKurzA

Mar 18

For @rdodev and everyone else. You can refer to the model card for vLLM instructions.
You need the nightly build of vLLM to serve Mistral 3.1 right now.
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade

Worked for me, without other settings. Great for A100 80GB VRAM if anyone wonders

DuckyBlender

Mar 18

Still no config.json?

rdodev

Mar 18

@patrickvonplaten reading over the system prompt is this accurate?

Your knowledge base was last updated on 2023-10-01.

Or did you mean 2024?

DuckyBlender

Mar 18

Omg it's here!!

x0wllaar

Mar 18

@bartowski ping :)

config.json is available

MrDevolver

Mar 18

•

edited Mar 18

"model_max_length": 1000000000000000019884624838656,

Seriously? 🤔

x0wllaar

Mar 18

•

edited Mar 18

That was a problem with Pixtral as well IIRC

I sent a PR with a fix

x0wllaar

Mar 18

"model_max_length": 1000000000000000019884624838656,

Seriously? 🤔

It seems it's indented https://discuss.huggingface.co/t/tokenizers-what-this-max-length-number/28484

MrDevolver

Mar 18

•

edited Mar 18

"model_max_length": 1000000000000000019884624838656,

Seriously? 🤔

It seems it's indented https://discuss.huggingface.co/t/tokenizers-what-this-max-length-number/28484

Does this actually work in GGUF? I've seen some of the converted models prepared for GGUF conversion and they used much smaller number which supposedly fixed this large one, so I don't know anymore...

EDIT:
But wait... The model page says "Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance"... So maybe there is no cap hence the large number, but this 128k tokens limit is essentially saying that beyond that point the quality of the output is not guaranteed. So perhaps it doesn't matter technically (as long as GGUF is fine with it), in practice going past 128k tokens may give bad results though... 🤔

bartowski

Mar 18

@MrDevolver llama.cpp doesn't read the value of it except for in some models when max_position_embeddings is not set:

https://github.com/ggml-org/llama.cpp/blob/99aa304fb900654ec338749f64e62895b9a88afd/convert_hf_to_gguf.py#L1169

by default all models use max_position_embeddings:

https://github.com/ggml-org/llama.cpp/blob/99aa304fb900654ec338749f64e62895b9a88afd/convert_hf_to_gguf.py#L220

@x0wllaar the static quants are up ! https://huggingface.co/lmstudio-community/Mistral-Small-3.1-24B-Instruct-2503-GGUF/

imatrix are on the way :)

mrfakename

Mar 18

"model_max_length": 1000000000000000019884624838656,

Seriously? 🤔

Opened a PR #17 to fix this, also fixed in my conversion (text-only)

x0wllaar

Mar 18

Thank you!

mistralai
/

Mistral-Small-3.1-24B-Instruct-2503

HF Format?