not working with vllm-openai 0.10.1
Hello Dan, big fan! (or whoever). This does not seem to work with the latest vllm-openai container v0.10.1. Configuration:
command: >
--tokenizer_mode mistral
--config_format mistral
--load_format mistral
--tool-call-parser mistral
--enable-auto-tool-choice
--gpu-memory-utilization 0.5
--max_model_len 16384
--port 8105
--model unsloth/Mistral-Small-3.2-24B-Instruct-2506-FP8
--served-model-name Mistral-Small-3.2-24B-Instruct-2506-FP8
--kv-cache-dtype fp8
--tensor-parallel-size 2
fails more or less quietly with:
(APIServer pid=1) INFO 08-19 11:08:14 [chat_utils.py:470] Detected the chat template content format to be 'string'. You can set --chat-template-content-format
to override this.
(APIServer pid=1) WARNING 08-19 11:08:14 [chat_utils.py:378] 'add_generation_prompt' is not supported for mistral tokenizer, so it will be ignored.
(APIServer pid=1) WARNING 08-19 11:08:14 [chat_utils.py:382] 'continue_final_message' is not supported for mistral tokenizer, so it will be ignored.
(APIServer pid=1) /usr/local/lib/python3.12/dist-packages/mistral_common/tokens/tokenizers/tekken.py:461: FutureWarning: Using the tokenizer's special token policy (SpecialTokenPolicy.IGNORE) is deprecated. It will be removed in 1.10.0. Please pass a special token policy explicitly. Future default will be SpecialTokenPolicy.IGNORE.
(APIServer pid=1) warnings.warn(
(APIServer pid=1) INFO: 127.0.0.1:33892 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Is this about "removed tokenizer's special token policy". Is there anything that can be done? Reverting to v0.10.0 starts working again... Thanks!