MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric Paper • 2403.07839 • Published Mar 12, 2024
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation Paper • 2505.05422 • Published May 8 • 7
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation Paper • 2505.05422 • Published May 8 • 7