Is the `get_vision_embedding` method published yet?

#19

by ronima - opened May 27, 2024

May 27, 2024

GitHub suggests that the method exists, here:
https://github.com/OpenBMB/MiniCPM-V/blob/main/omnilmm/model/omnilmm.py#L108

But, it is missing from the HF repo code, here:
https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/blob/main/modeling_minicpmv.py#L64

Is there a way to get vision embeddings for an images using the model?
Thanks.

Cuiunbo

OpenBMB org May 27, 2024

Thank you for your attention, it is possible, see here for implementation
https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/blob/b02e4d7872bafd5a376e604ee069c2342f11062d/modeling_minicpmv.py#L93

ronima

May 27, 2024

•

edited May 27, 2024

@Cuiunbo
What should the tgt_sizes be then?
https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/blob/b02e4d7872bafd5a376e604ee069c2342f11062d/modeling_minicpmv.py#L94

The updated version suggests no such variable:
https://github.com/OpenBMB/MiniCPM-V/blob/main/omnilmm/model/resampler.py#L149

Thanks!

The output of row number 93 (vision_embedding = self.vpm(all_pixel_values.type(dtype), patch_attention_mask=patch_attn_mask).last_hidden_state) is a variable of dimension (1, 256, 1152) for an input image of shape (1, 3, 224, 224) whereas I need an output of a single dimension. Is that possible to extract?

Cuiunbo

OpenBMB org May 27, 2024

Hi, the two models you are looking at are not the same framework. One is omnilmm one is minicpm-v.

ronima

May 27, 2024

@Cuiunbo Got it, thanks.
In that case, what is the proper manner of manipulating the (1, 256, 1152) output to another tensor of shape (1, X)?
I mean, how should one operate tgt_sizes and resampler to achieve that?

Cuiunbo

OpenBMB org May 27, 2024

•

edited May 27, 2024

The image was sliced before forward, and tagt_size was used to adjust the resampler's embedding. We didn't add image processing to the forward, but if you want to train the model, this is a viable way to do it, see our github repo for fine-tune code as well.
https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/blob/b02e4d7872bafd5a376e604ee069c2342f11062d/modeling_minicpmv.py#L407

Cuiunbo changed discussion status to closed Jun 2, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment