HELP

#37

by eunhaeleee - opened Oct 28, 2024

Oct 28, 2024

I'm trying to use llava-1.5-7b-hf and i'm new and clueless in LMMs but i have an error when i try to use the simple example
raise ValueError(
ValueError: The input provided to the model are wrong. The number of im
age tokens is 0 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.

the code i'm using is:

import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration

model_id = "llava-hf/llava-1.5-7b-hf"
model = LlavaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16
).to(0)

processor = AutoProcessor.from_pretrained(model_id, patch_size = 32 , vision_feature_select_strategy = 'default')

prompt = "What's in the picture"

image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
raw_image = Image.open(requests.get(image_file, stream=True).raw)

inputs = processor(images=raw_image, text=prompt, return_tensors='pt').to(0, dtype=torch.float16)

output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))

RaushanTurganbay

Llava Hugging Face org Oct 28, 2024

You have to add special <image> token in the prompt so that it knows there should be one image and make sure to format the prompt correctly in chat template. More info in model doc page https://huggingface.co/docs/transformers/en/model_doc/llava

eunhaeleee

Oct 28, 2024

•

edited Oct 29, 2024

You have to add special <image> token in the prompt so that it knows there should be one image and make sure to format the prompt correctly in chat template. More info in model doc page https://huggingface.co/docs/transformers/en/model_doc/llava

thank you for replying but i still have an error

raise ValueError(
ValueError: Image features and image tokens do not match: tokens: 577, features 576

i added these lines:

processor.vision_feature_select_strategy = 'patch'
processor.patch_size = 14

conversation = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What is shown in this image?"},
],
},
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

does this maybe have to do with this??
\envs\dsrnet\lib\site-packages\transformers\models\clip\modeling_clip.
py:540: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actionsrunner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)

shiroppo

Feb 24

You have to add special <image> token in the prompt so that it knows there should be one image and make sure to format the prompt correctly in chat template. More info in model doc page https://huggingface.co/docs/transformers/en/model_doc/llava

Perfect, this solved my issue

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment