cminst
/

StreamMamba

Video Classification

Model card Files Files and versions

StreamMamba / README.md

qingy2024's picture

Update README.md

91bc6c7 verified 2 months ago

|

932 Bytes

	---
	license: apache-2.0
	---

	# InternVideo2-B14
	### Cross-Modal and Vision-Language Model Checkpoints

	This repository hosts pre-trained model checkpoints for cross-modal video-text understanding, vision-language alignment, and efficient deployment. Below is a summary of included files:


	\| Filename \| Size \| Description \|
	\|-------------------------\|---------\|-----------------------------------------------------------------------------\|
	\| cross_mamba_film_ckpt.pt \| 504 MB \| A cross-modal checkpoint combining vision and text using FiLM (Feature-wise Linear Modulation) and Mamba layers \|
	\| internvideo2_clip.pt \| 5.55 MB \| CLIP component of InternVideo2-B14 \|
	\| internvideo2_vision.pt \| 205 MB \| Vision encoder backbone for InternVideo2-B14 \|
	\| mobileclip_blt.pt \| 599 MB \| Lightweight MobileCLIP variant (BLT) \|