Overcoming Vocabulary Constraints with Pixel-level Fallback Paper ⢠2504.02122 ⢠Published Apr 2 ⢠2
Can Community Notes Replace Professional Fact-Checkers? Paper ⢠2502.14132 ⢠Published Feb 19 ⢠6
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper ⢠2412.03555 ⢠Published Dec 4, 2024 ⢠134
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture Paper ⢠2406.11030 ⢠Published Jun 16, 2024
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning Paper ⢠2406.02265 ⢠Published Jun 4, 2024 ⢠7
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings Paper ⢠2404.16820 ⢠Published Apr 25, 2024 ⢠17
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models Paper ⢠2311.07022 ⢠Published Nov 13, 2023 ⢠1
Text Rendering Strategies for Pixel Language Models Paper ⢠2311.00522 ⢠Published Nov 1, 2023 ⢠12
Text Rendering Strategies for Pixel Language Models Paper ⢠2311.00522 ⢠Published Nov 1, 2023 ⢠12
Text Rendering Strategies for Pixel Language Models Paper ⢠2311.00522 ⢠Published Nov 1, 2023 ⢠12
Text Rendering Strategies for Pixel Language Models Paper ⢠2311.00522 ⢠Published Nov 1, 2023 ⢠12