D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated 2 days ago • 45
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Paper • 2504.18415 • Published 12 days ago • 41
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published May 16, 2024 • 33
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published about 1 month ago • 180
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding Paper • 2503.13964 • Published Mar 18 • 19
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models Paper • 2312.02949 • Published Dec 5, 2023 • 15
NitiBench: A Comprehensive Studies of LLM Frameworks Capabilities for Thai Legal Question Answering Paper • 2502.10868 • Published Feb 15 • 2
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Paper • 2502.18017 • Published Feb 25 • 20
Scalable Vision Language Model Training via High Quality Data Curation Paper • 2501.05952 • Published Jan 10 • 2
ColQwen2 Models Collection Pre-trained checkpoints for the ColQwen2 model. • 4 items • Updated Jan 23 • 4
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated 9 days ago • 463