EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models Paper • 2506.01667 • Published 10 days ago • 21
MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding Paper • 2406.04264 • Published Jun 6, 2024 • 2
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding Paper • 2409.14485 • Published Sep 22, 2024 • 2
TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control Paper • 2410.10133 • Published Oct 14, 2024 • 1
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding Paper • 2503.18478 • Published Mar 24 • 1
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 55
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding Paper • 2409.14485 • Published Sep 22, 2024 • 2
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 55
MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding Paper • 2406.04264 • Published Jun 6, 2024 • 2
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval Paper • 2406.04292 • Published Jun 6, 2024 • 1
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation Paper • 2402.03216 • Published Feb 5, 2024 • 5
RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder Paper • 2205.12035 • Published May 24, 2022
C-Pack: Packaged Resources To Advance General Chinese Embedding Paper • 2309.07597 • Published Sep 14, 2023 • 1
Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings Paper • 2204.00185 • Published Apr 1, 2022
Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval Paper • 2201.05409 • Published Jan 14, 2022