-
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Paper • 2310.19773 • Published • 20 -
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Paper • 2310.05863 • Published • 1 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 90 -
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
Paper • 2311.10126 • Published • 10
Zach Mustafa PRO
Zmu
AI & ML interests
None yet
Recent Activity
liked
a Space
29 minutes ago
XiangJinYu/SPO
liked
a model
29 minutes ago
HuggingFaceTB/SmolVLM2-256M-Video-Instruct
liked
a model
30 minutes ago
HuggingFaceTB/SmolVLM2-500M-Video-Instruct
Organizations
Collections
4
-
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 42 -
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Paper • 2311.10775 • Published • 10 -
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 28
models
5
Zmu/gemma_python_tuned
Updated
Zmu/xcd_classification_v3
Image Classification
•
Updated
•
181
•
1
Zmu/xcd_classification_v2
Image Classification
•
Updated
•
182
Zmu/xcd_classifier
Image Classification
•
Updated
•
182
Zmu/ast-finetuned-audioset-10-10-0.4593-finetuned-gtzan
Audio Classification
•
Updated
•
175
datasets
None public yet