StreamMamba / README.md
qingy2024's picture
Update README.md
5b2c2d4 verified
|
raw
history blame
1.04 kB
metadata
license: apache-2.0
language:
  - en
base_model:
  - OpenGVLab/InternVideo2_distillation_models
pipeline_tag: video-classification

InternVideo2-B14

Cross-Modal and Vision-Language Model Checkpoints

This repository hosts pre-trained model checkpoints for cross-modal video-text understanding, vision-language alignment, and efficient deployment. Below is a summary of included files:

Filename Size Description
cross_mamba_film_warmup.pt 504 MB A cross-modal checkpoint combining vision and text using FiLM (Feature-wise Linear Modulation) and Mamba layers
internvideo2_clip.pt 5.55 MB CLIP component of InternVideo2-B14
internvideo2_vision.pt 205 MB Vision encoder backbone for InternVideo2-B14
mobileclip_blt.pt 599 MB Lightweight MobileCLIP variant (BLT)