HF Format?
still uploading... french internet...
i need the config.json!
Yay, new Cydonia incoming
still uploading... french internet...
Any updated on this? vllm
crashes when trying to load this .
Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.
Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.
Plot twist:
It was meant to be a private repo for internal use, published by accident. π€£
Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.
Plot twist:
It was meant to be a private repo for internal use, published by accident. π€£
oh this makes a lot more sense lol still very appreciated!
Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.
For
@rdodev
and everyone else. You can refer to the model card for vLLM instructions.
You need the nightly build of vLLM to serve Mistral 3.1 right now.pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2
Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.
For @rdodev and everyone else. You can refer to the model card for vLLM instructions.
You need the nightly build of vLLM to serve Mistral 3.1 right now.pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2
I followed their directions in the model card. Nighlty vllm is installed and configured. Still aborts when trying to load the model from HF because it's lacking necessary files.
Any updates @patrickvonplaten ? Can't get it to run under vllm without config.json and other files it's expecting.
For @rdodev and everyone else. You can refer to the model card for vLLM instructions.
You need the nightly build of vLLM to serve Mistral 3.1 right now.pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2
What about literally everyone else who can't use vllm and instead need the model to be converted to GGUF or something like that? Such conversions require more files than what was supplied.
@MrDevolver There seems to be a HF compatible repo of MS 3.1 available https://huggingface.co/anthracite-core/Mistral-Small-3.1-24B-Instruct-2503-HF/
Hey all! HF format will be available tomorrow, along with a Transformers release for it π€
@MrDevolver There seems to be a HF compatible repo of MS 3.1 available https://huggingface.co/anthracite-core/Mistral-Small-3.1-24B-Instruct-2503-HF/
Looks more like a hack and there's no chat template. Official repo would be nice.
Is this official repo? Where if config.json?
@ivanfioravanti they use their own format. we are waiting for some one to convert ,apparently a user called anthracite-core hacked their way.
Hey all! HF format will be available tomorrow, along with a Transformers release for it π€
K, unliking the model until it's actually useable.
for people who are too anxious to wait for tomorrow, there is a conversion script here https://huggingface.co/anthracite-core/Mistral-Small-3.1-24B-Instruct-2503-HF/discussions/1#67d8a8d541d31cc626cded1d
Thanks to @mrfakename
I was able to run a text-only version using the above script. The sha256 of the local safetensors match the files at anthracite-core (at least 0001 and 0010 that I checked). I made a MLX 4bit quant and everything seems to be working jsut fine.
for people who are too anxious to wait for tomorrow, there is a conversion script here https://huggingface.co/anthracite-core/Mistral-Small-3.1-24B-Instruct-2503-HF/discussions/1#67d8a8d541d31cc626cded1d
Thanks to @mrfakename
For others who are anxious, I'm (slowly) uploading some imatrix GGUF's at https://huggingface.co/qwp4w3hyb/Mistral-Small-3.1-24B-Instruct-2503-HF-iMat-GGUF
First one should be there in ~ 35min
I'll probably wait for the official upload but good to see some people have working conversions going up :)
For others who are anxious, I'm (slowly) uploading some imatrix GGUF's at https://huggingface.co/qwp4w3hyb/Mistral-Small-3.1-24B-Instruct-2503-HF-iMat-GGUF
First one should be there in ~ 35min
I bet the first one is the one that's one level bigger than what my PC can handle! ~Random Anxious Guy
I bet the first one is the one that's one level bigger than what my PC can handle! ~Random Anxious Guy
Order in the script is: IQ4_XS Q4_K_M Q5_K_M Q6_K IQ4_NL IQ2_S IQ2_XS IQ2_XXS IQ3_S IQ3_XS IQ3_XXS Q4_K_S Q5_K_S Q8_0 Q4_0 IQ2_M IQ3_M IQ1_S bf16.
Some quants should be live now for the Instruct model (sorry, no imatrix):
https://huggingface.co/mrfakename/mistral-small-3.1-24b-instruct-2503-gguf
For @rdodev and everyone else. You can refer to the model card for vLLM instructions.
You need the nightly build of vLLM to serve Mistral 3.1 right now.pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
Worked for me, without other settings. Great for A100 80GB VRAM if anyone wonders
Still no config.json?
@patrickvonplaten reading over the system prompt is this accurate?
Your knowledge base was last updated on 2023-10-01.
Or did you mean 2024?
Omg it's here!!
@bartowski ping :)
config.json is available
"model_max_length": 1000000000000000019884624838656,
Seriously? π€
That was a problem with Pixtral as well IIRC
I sent a PR with a fix
"model_max_length": 1000000000000000019884624838656,
Seriously? π€
It seems it's indented https://discuss.huggingface.co/t/tokenizers-what-this-max-length-number/28484
"model_max_length": 1000000000000000019884624838656,
Seriously? π€
It seems it's indented https://discuss.huggingface.co/t/tokenizers-what-this-max-length-number/28484
Does this actually work in GGUF? I've seen some of the converted models prepared for GGUF conversion and they used much smaller number which supposedly fixed this large one, so I don't know anymore...
EDIT:
But wait... The model page says "Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance"... So maybe there is no cap hence the large number, but this 128k tokens limit is essentially saying that beyond that point the quality of the output is not guaranteed. So perhaps it doesn't matter technically (as long as GGUF is fine with it), in practice going past 128k tokens may give bad results though... π€
@MrDevolver
llama.cpp doesn't read the value of it except for in some models when max_position_embeddings
is not set:
by default all models use max_position_embeddings
:
@x0wllaar the static quants are up ! https://huggingface.co/lmstudio-community/Mistral-Small-3.1-24B-Instruct-2503-GGUF/
imatrix are on the way :)
"model_max_length": 1000000000000000019884624838656,
Seriously? π€
Opened a PR #17 to fix this, also fixed in my conversion (text-only)
Thank you!