cminst
/

StreamMamba

Video Classification

Model card Files Files and versions

qingy2024 commited on Jul 12

Commit

408c7d7

·

verified ·

1 Parent(s): c44c0fa

Update README.md

Files changed (1) hide show

README.md +16 -3

README.md CHANGED Viewed

@@ -1,3 +1,16 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# InternVideo2-B14
+### Cross-Modal and Vision-Language Model Checkpoints
+This repository hosts pre-trained model checkpoints for cross-modal video-text understanding, vision-language alignment, and efficient deployment. Below is a summary of included files:
+| Filename                | Size    | Description                                                                 |
+|-------------------------|---------|-----------------------------------------------------------------------------|
+| cross_mamba_film_ckpt.pt | 504 MB  | A cross-modal checkpoint combining vision and text using **FiLM** (Feature-wise Linear Modulation) layers, optimized for Mamba architecture. |
+| internvideo2_clip.pt    | 5.55 MB | CLIP component of **InternVideo2-B14** |
+| internvideo2_vision.pt  | 205 MB  | Vision encoder backbone for **InternVideo2-B14** |
+| mobileclip_blt.pt       | 599 MB  | Lightweight **MobileCLIP** variant (BLT) |