Attention Prompting on Image for Large Vision-Language Models Paper • 2409.17143 • Published Sep 25, 2024 • 7
Mugs: A Multi-Granular Self-Supervised Learning Framework Paper • 2203.14415 • Published Mar 27, 2022
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Paper • 2101.11986 • Published Jan 28, 2021
ConvBERT: Improving BERT with Span-based Dynamic Convolution Paper • 2008.02496 • Published Aug 6, 2020
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities Paper • 2408.00765 • Published Aug 1, 2024 • 14
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities Paper • 2308.02490 • Published Aug 4, 2023 • 17