Expanding inputs for image tokens in LLaVa-NeXT should be done in processing.

#34
by miniTsl - opened

I was using the example code in model card for image understanding, but I get these messages which I am not sure should be cared for. If so, what should I do? Thanks a lot !!!

Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. 
Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.
vision_feature_select_strategy = {{vision_feature_select_strategy}}`.
 Using processors without these attributes in the config is deprecated and will throw an error in v4.47.

The same message appears when I use LLaVA 1.5 models, quite strange cause I stick strictly to the code provided in model cards

I solved this problem by adding 2 lines when in llava-1.5-7b-hf initialization:

self.processor.patch_size = self.model.config.vision_config.patch_size

self.processor.vision_feature_select_strategy = self.model.config.vision_feature_select_strategy

The code above means that I point out the patch_size and vision_feature_select_strategy manually using the same values from model.config.

Llava Hugging Face org

Hey everyone, the official model config will be updated with the new params soon, most prob in 2-3 weeks. That should eliminate any recent bugs with latency/indexing etc.

@RaushanTurganbay Hi! Could you confirm if the quick fix mentioned above for the model config is sufficient, or could you help update the config soon? We're on a tight deadline and want to prevent any potential issues, would really appreciate your help with updating it

Sign up or log in to comment