not working with vllm-openai 0.10.1

#1
by vcerny - opened

Hello Dan, big fan! (or whoever). This does not seem to work with the latest vllm-openai container v0.10.1. Configuration:

command: >
--tokenizer_mode mistral
--config_format mistral
--load_format mistral
--tool-call-parser mistral
--enable-auto-tool-choice
--gpu-memory-utilization 0.5
--max_model_len 16384
--port 8105
--model unsloth/Mistral-Small-3.2-24B-Instruct-2506-FP8
--served-model-name Mistral-Small-3.2-24B-Instruct-2506-FP8
--kv-cache-dtype fp8
--tensor-parallel-size 2

fails more or less quietly with:

(APIServer pid=1) INFO 08-19 11:08:14 [chat_utils.py:470] Detected the chat template content format to be 'string'. You can set --chat-template-content-format to override this.
(APIServer pid=1) WARNING 08-19 11:08:14 [chat_utils.py:378] 'add_generation_prompt' is not supported for mistral tokenizer, so it will be ignored.
(APIServer pid=1) WARNING 08-19 11:08:14 [chat_utils.py:382] 'continue_final_message' is not supported for mistral tokenizer, so it will be ignored.
(APIServer pid=1) /usr/local/lib/python3.12/dist-packages/mistral_common/tokens/tokenizers/tekken.py:461: FutureWarning: Using the tokenizer's special token policy (SpecialTokenPolicy.IGNORE) is deprecated. It will be removed in 1.10.0. Please pass a special token policy explicitly. Future default will be SpecialTokenPolicy.IGNORE.
(APIServer pid=1) warnings.warn(
(APIServer pid=1) INFO: 127.0.0.1:33892 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Is this about "removed tokenizer's special token policy". Is there anything that can be done? Reverting to v0.10.0 starts working again... Thanks!

Sign up or log in to comment