Video-Text-to-Text
Safetensors
mistral

Missing MultiModalLLM_PT Source Code for InternVideo2_chat_8B_HD_F16

#1
by CosineOne - opened

I'm trying to load the OpenGVLab/InternVideo2_chat_8B_HD_F16 model (pinned revision 0568fba18da65fc0c320e6a4be361ea40ce62a68) locally using AutoModel.from_pretrained(..., trust_remote_code=True).

The config.json for this revision specifies "architectures": ["MultiModalLLM_PT"] and "model_type": "mistral". However:

  1. The pinned revision 0568fba... contains no Python (.py) files.
  2. Its config.json lacks an auto_map entry to point to the MultiModalLLM_PT class definition.
  3. The main branch (the fallback for transformers) also does not seem to contain MultiModalLLM_PT (e.g., in a modeling_mistral.py or other files like modeling_internvideo2_vit.py or modeling_base.py).

This makes it impossible to load the model with trust_remote_code=True as the MultiModalLLM_PT class definition cannot be found.

Could you please provide guidance on where to find the source code for MultiModalLLM_PT for this model version, or if there are plans to add it to the repository?

Thank you!

Sign up or log in to comment