Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper β’ 2412.04424 β’ Published Dec 5, 2024 β’ 64
view article Article ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models By ahmed-masry β’ Oct 18, 2024 β’ 20
DocLayout-YOLO Collection Dataset and model for DocLayout-YOLO β’ 10 items β’ Updated Jan 14 β’ 17
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper β’ 2410.02757 β’ Published Oct 3, 2024 β’ 38
Molmo Collection Artifacts for open multimodal language models. β’ 5 items β’ Updated Apr 30 β’ 305
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. β’ 46 items β’ Updated Apr 28 β’ 618
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper β’ 2409.12191 β’ Published Sep 18, 2024 β’ 78
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper β’ 2409.01704 β’ Published Sep 3, 2024 β’ 85
view article Article Making LLMs lighter with AutoGPTQ and transformers By marcsun13 and 5 others β’ Aug 23, 2023 β’ 54
Awesome Document AI Collection A collection of open-source document AI π π π β’ 27 items β’ Updated Mar 11, 2024 β’ 80
Qwen2-VL Collection Vision-language model series based on Qwen2 β’ 16 items β’ Updated Apr 28 β’ 218
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper β’ 2408.02442 β’ Published Aug 5, 2024 β’ 21
view article Article Introducing TextImage Augmentation for Document Images By danaaubakirova and 2 others β’ Aug 6, 2024 β’ 33