Model only output !!!!! when using with vllm

#22
by Jacqkues - opened

I am trying to run the code for inference with vllm on a T4 gpu . But for one page it take very long time and the output are only ! token

Jacqkues changed discussion status to closed
Jacqkues changed discussion status to open
IBM Granite org

Hello @Jacqkues ,

Could you please try to use:

vllm==0.10.1.1
transformers==4.55.2
torch==2.7.1
torchvision==0.22.1

Still have only !!! token in the output file using the code provided in the model card for vllm

IBM Granite org

@Jacqkues could you provide the input image you are using?

test.png
When I try it with the gradio demo running on my vm it's working but not with vllm

I have fixed the issue by adding --dtype float32 to the vllm command line

IBM Granite org

@Jacqkues thanks for figuring this out, I created a Troubleshooting section on the README. Closing here.

auerchristoph changed discussion status to closed

Sign up or log in to comment