--- license: apache-2.0 language: - en base_model: - OpenGVLab/InternVideo2_distillation_models pipeline_tag: video-classification --- # cminst/StreamMamba ### Vision-Language Model and StreamMamba checkpoints
License: Apache-2.0 This model is licensed under the Apache-2.0 License.
--- ## Overview **InternVideo2-B14** is a family of pre-trained vision-language models designed for cross-modal video-text understanding, vision-language alignment, and efficient deployment. This repository provides modular checkpoints for various downstream tasks, including video classification and frame-skipping systems. **Base Model**: [OpenGVLab/InternVideo2_distillation_models](https://github.com/OpenGVLab/InternVideo) **Pipeline Tag**: `video-classification` (supports vision-language and video-only tasks) --- ## Model Details ### Included Checkpoints | Filename | Size | Description | |-------------------------|----------|-----------------------------------------------------------------------------| | `cross_mamba_film_warmup.pt` | 504 MB | Cross-modal model combining vision and text using **FiLM** (Feature-wise Linear Modulation) and **Mamba** layers for temporal modeling. | | `mamba_mobileclip_ckpt.pt` | 500 MB | StreamMamba temporal aggregator trained on MobileCLIP embeddings (no FiLM). Checkpoint 6900. | | `internvideo2_clip.pt` | 5.55 MB | CLIP-style vision-language alignment component for InternVideo2-B14. | | `internvideo2_vision.pt` | 205 MB | Vision encoder backbone (InternVideo2-B14) for video feature extraction. | | `mobileclip_blt.pt` | 599 MB | Lightweight **MobileCLIP** variant (BLT) for resource-constrained applications. | | `lstm_ckpt.pt` | 530 MB | Contains InternVideo2-B14 weights and MobileCLIP weights, along with a trained LSTM (used for ablating against Mamba) | #### StreamMamba Self-Predictive Frame Skipping (SPFS) The `spfs_r64` folder contains a self-contained system for adaptive frame skipping in videos. Each checkpoint file includes: - MobileCLIP vision/text encoders - InternVideo2-B14 vision encoder weights - Mamba temporal aggregator (merged from `mamba_mobileclip_ckpt.pt`) - SPFS-specific weights for frame selection