vLLM compatiblility
Hi team — thanks for the great work on InternVL3.5!
I noticed the checkpoints use processor_class: "InternVLProcessor", which expects extra special tokens (e.g., start_image_token). Those entries aren’t present in tokenizer_config.json, which prevents serving with vLLM. Is there a plan to make the releases vLLM-compatible (e.g., by including these tokens or publishing guidance/workarounds)?
Thanks!
vllm tool calls not working
vllm serve '/mnt/models/vllm_models/InternVL3_5-38B-AWQ-8bit/' --max_model_len 32000 --tensor-parallel-size 8 --gpu_memory_utilization 0.95 --trust-remote-code --dtype float16 --enable-auto-tool-choice --tool-call-parser internlm
is this issue resolved ? I'm also facing this
Can you provide the version of vLLM you used? I tested it with 0.8.5.post1 and 0.10.1 and found that it works well.
Version: 0.10.1rc2.dev294+gc9c3a7856.cu124
Also is the HF version for 3.5 not supported on VLLM for llm. generate ? THe same is happening for internVL 3 38B hf in the same vllm version
I can share the logs if you want
Also is the HF version for 3.5 not supported on VLLM for llm. generate ? THe same is happening for internVL 3 38B hf in the same vllm version
Can you provide the version of vLLM you used? I tested it with 0.8.5.post1 and 0.10.1 and found that it works well.
When I tried to serve the model, it did work because VLLM holds multiple strategies to get the chat template here. However, cached_get_processor
always raises errors due to the missing special tokens, which breaks the cache and therefore, executes AutoProcessor.from_pretrained(...)
for every request. The service will then be down due to HTTP Error 429 caused by repetitive HEAD operations to HuggingFace.