Is it possible to only input text in LLaVa model?
#38
by
Tizzzzy
- opened
Hi,
Currently I can successful do image question answering with LLaVa model with the following code:
processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")
model = AutoModelForImageTextToText.from_pretrained("llava-hf/llava-1.5-7b-hf", device_map="auto")
def llava_describe(image):
question = "<image> Describe this image as detail as possible."
inputs = processor(images=image, text=question, return_tensors="pt").to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=200)
answer = processor.decode(generated_ids[0][2:], skip_special_tokens=True)
I also want to only input text in the model. However, my code doesn't work:
def llava_describe(image):
question = "..."
inputs = processor(images=None, text=question, return_tensors="pt").to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=200)
answer = processor.decode(generated_ids[0][2:], skip_special_tokens=True)
I am keep getting this error:
Traceback (most recent call last):
File "/workspace/llava/model.py", line 138, in <module>
generated_text = llava_describe(image)
File "/workspace/llava/model.py", line 48, in llava_describe
generated_ids = model.generate(**inputs, max_new_tokens=200)
File "/opt/conda/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2215, in generate
result = self._sample(
File "/opt/conda/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 3206, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/opt/conda/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/llava/lib/python3.10/site-packages/transformers/models/llava/modeling_llava.py", line 487, in forward
inputs_embeds, attention_mask, labels, position_ids = self._merge_input_ids_with_image_features(
File "/opt/conda/envs/llava/lib/python3.10/site-packages/transformers/models/llava/modeling_llava.py", line 303, in _merge_input_ids_with_image_features
num_images, num_image_patches, embed_dim = image_features.shape
AttributeError: 'NoneType' object has no attribute 'shape'
Note this task is important for me, and I really want LLaVa to support text only also.
Thank you for your help!
Hey @Tizzzzy !
Currently Llava models will not support text-only input. I have been changing lot of stuff lately with llava models and will bring back the text-only inference soon. It got removed accidentally but it shouldn't have been