Missing MultiModalLLM_PT Source Code for InternVideo2_chat_8B_HD_F16

by CosineOne - opened Jun 14

Jun 14

I'm trying to load the OpenGVLab/InternVideo2_chat_8B_HD_F16 model (pinned revision 0568fba18da65fc0c320e6a4be361ea40ce62a68) locally using AutoModel.from_pretrained(..., trust_remote_code=True).

The config.json for this revision specifies "architectures": ["MultiModalLLM_PT"] and "model_type": "mistral". However:

The pinned revision 0568fba... contains no Python (.py) files.
Its config.json lacks an auto_map entry to point to the MultiModalLLM_PT class definition.
The main branch (the fallback for transformers) also does not seem to contain MultiModalLLM_PT (e.g., in a modeling_mistral.py or other files like modeling_internvideo2_vit.py or modeling_base.py).

This makes it impossible to load the model with trust_remote_code=True as the MultiModalLLM_PT class definition cannot be found.

Could you please provide guidance on where to find the source code for MultiModalLLM_PT for this model version, or if there are plans to add it to the repository?

Thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment