--- license: apache-2.0 extra_gated_prompt: >- You agree to not use the model to conduct experiments that cause harm to human subjects. extra_gated_fields: Name: text Company/Organization: text Country: text E-Mail: text datasets: - OpenGVLab/InternVid pipeline_tag: image-feature-extraction --- # Model Card for InternVideo2 (Vision-Only) This model card describes the **vision encoder** component extracted from the InternVideo2 foundation model series. ## Model Details This checkpoint contains only the vision backbone parameters, suitable for video or image feature extraction tasks. It was obtained by filtering a multimodal InternVideo2 checkpoint (e.g., S2_6B). ### Model Sources - **Original Project Repository:** [InternVideo2](https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo2) - **Original Paper:** [2403.15377](https://arxiv.org/abs/2403.15377) - **Original Point of Contact:** mailto:[InternVideo Group](gvx-sh@pjlab.org.cn) ### Uploader - **This specific vision-only checkpoint uploaded by:** [qingy2024](https://huggingface.co/qingy2024) ## How to Use This file (`InternVideo2_S2_6B_vision.pt`) is a standard PyTorch state dictionary containing only the vision encoder weights. It can be loaded into a compatible vision model architecture using `model.load_state_dict()`. ```python import torch vision_state_dict = torch.load("InternVideo2_S2_6B_vision.pt", map_location='cpu') # or 'cuda' ``` ## Limitations This model contains only the vision encoder. It **does not** include the text or audio encoders and cannot perform tasks requiring multimodal inputs unless combined with separate models for those modalities. ## Citation If you use this vision encoder, please cite the original InternVideo2 paper: ```bibtex @article{wang2024internvideo2, title={InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding}, author={Wang, Yi and Li, Kunchang and Li, Xinhao and Yu, Jiashuo and He, Yinan and Chen, Guo and Pei, Baoqi and Zheng, Rongkun and Xu, Jilan and Wang, Zun and others}, journal={arXiv preprint arXiv:2403.15377}, year={2024} } ```