HF Format?

#2
by bartowski - opened

Is HF format on the way? was done by @cyrilvallez last time so pinging here :)

still uploading... french internet...

i need the config.json!

Yay, new Cydonia incoming

still uploading... french internet...

Any updated on this? vllm crashes when trying to load this .

i need the config.json!

Don't we all? πŸ˜„

Just someone please blow the dust off of the Mistral CD and let it play again...
img

Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.

Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.

Plot twist:
It was meant to be a private repo for internal use, published by accident. 🀣

Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.

Plot twist:
It was meant to be a private repo for internal use, published by accident. 🀣

oh this makes a lot more sense lol still very appreciated!

Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.

For @rdodev and everyone else. You can refer to the model card for vLLM instructions.
You need the nightly build of vLLM to serve Mistral 3.1 right now.
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2

@ArtusDev yes, I saw that however I'm gpu poor (48gb VRAM) and need the config files to quantize this puppy :-)

Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.

For @rdodev and everyone else. You can refer to the model card for vLLM instructions.
You need the nightly build of vLLM to serve Mistral 3.1 right now.
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2

I followed their directions in the model card. Nighlty vllm is installed and configured. Still aborts when trying to load the model from HF because it's lacking necessary files.

Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.

For @rdodev and everyone else. You can refer to the model card for vLLM instructions.
You need the nightly build of vLLM to serve Mistral 3.1 right now.
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2

What about literally everyone else who can't use vllm and instead need the model to be converted to GGUF or something like that? Such conversions require more files than what was supplied.

@rdodev Weird... I've successfully served 3.1 through nightly vLLM without issues. Maybe check that mistral_common >= 1.5.4 is getting installed as well?

Hey all! HF format will be available tomorrow, along with a Transformers release for it πŸ€—

@MrDevolver There seems to be a HF compatible repo of MS 3.1 available https://huggingface.co/anthracite-core/Mistral-Small-3.1-24B-Instruct-2503-HF/

Looks more like a hack and there's no chat template. Official repo would be nice.

Is this official repo? Where if config.json?

@ivanfioravanti they use their own format. we are waiting for some one to convert ,apparently a user called anthracite-core hacked their way.

Hey all! HF format will be available tomorrow, along with a Transformers release for it πŸ€—

mayaka-happy.gif

Hey all! HF format will be available tomorrow, along with a Transformers release for it πŸ€—

K, unliking the model until it's actually useable.

Hey all! HF format will be available tomorrow, along with a Transformers release for it πŸ€—

mayaka-happy.gif

Schweeet!! Aww cute

for people who are too anxious to wait for tomorrow, there is a conversion script here https://huggingface.co/anthracite-core/Mistral-Small-3.1-24B-Instruct-2503-HF/discussions/1#67d8a8d541d31cc626cded1d

Thanks to @mrfakename

I was able to run a text-only version using the above script. The sha256 of the local safetensors match the files at anthracite-core (at least 0001 and 0010 that I checked). I made a MLX 4bit quant and everything seems to be working jsut fine.

for people who are too anxious to wait for tomorrow, there is a conversion script here https://huggingface.co/anthracite-core/Mistral-Small-3.1-24B-Instruct-2503-HF/discussions/1#67d8a8d541d31cc626cded1d

Thanks to @mrfakename

For others who are anxious, I'm (slowly) uploading some imatrix GGUF's at https://huggingface.co/qwp4w3hyb/Mistral-Small-3.1-24B-Instruct-2503-HF-iMat-GGUF

First one should be there in ~ 35min

I'll probably wait for the official upload but good to see some people have working conversions going up :)

For others who are anxious, I'm (slowly) uploading some imatrix GGUF's at https://huggingface.co/qwp4w3hyb/Mistral-Small-3.1-24B-Instruct-2503-HF-iMat-GGUF

First one should be there in ~ 35min

I bet the first one is the one that's one level bigger than what my PC can handle! ~Random Anxious Guy

I bet the first one is the one that's one level bigger than what my PC can handle! ~Random Anxious Guy

Order in the script is: IQ4_XS Q4_K_M Q5_K_M Q6_K IQ4_NL IQ2_S IQ2_XS IQ2_XXS IQ3_S IQ3_XS IQ3_XXS Q4_K_S Q5_K_S Q8_0 Q4_0 IQ2_M IQ3_M IQ1_S bf16.

I'll probably wait for the official upload but good to see some people have working conversions going up :)

Sure, in the meantime keep nagging reminding them to upload it.

#ReleaseMistralNow

img

Some quants should be live now for the Instruct model (sorry, no imatrix):
https://huggingface.co/mrfakename/mistral-small-3.1-24b-instruct-2503-gguf

For @rdodev and everyone else. You can refer to the model card for vLLM instructions.
You need the nightly build of vLLM to serve Mistral 3.1 right now.
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade

Worked for me, without other settings. Great for A100 80GB VRAM if anyone wonders

Still no config.json?

@patrickvonplaten reading over the system prompt is this accurate?

Your knowledge base was last updated on 2023-10-01.

Or did you mean 2024?

Omg it's here!!

@bartowski ping :)

config.json is available

"model_max_length": 1000000000000000019884624838656,

Seriously? πŸ€”

That was a problem with Pixtral as well IIRC

I sent a PR with a fix

"model_max_length": 1000000000000000019884624838656,

Seriously? πŸ€”

It seems it's indented https://discuss.huggingface.co/t/tokenizers-what-this-max-length-number/28484

"model_max_length": 1000000000000000019884624838656,

Seriously? πŸ€”

It seems it's indented https://discuss.huggingface.co/t/tokenizers-what-this-max-length-number/28484

Does this actually work in GGUF? I've seen some of the converted models prepared for GGUF conversion and they used much smaller number which supposedly fixed this large one, so I don't know anymore...

EDIT:
But wait... The model page says "Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance"... So maybe there is no cap hence the large number, but this 128k tokens limit is essentially saying that beyond that point the quality of the output is not guaranteed. So perhaps it doesn't matter technically (as long as GGUF is fine with it), in practice going past 128k tokens may give bad results though... πŸ€”

"model_max_length": 1000000000000000019884624838656,

Seriously? πŸ€”

Opened a PR #17 to fix this, also fixed in my conversion (text-only)

Thank you!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment