OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper โข 2503.17352 โข Published 8 days ago โข 20
One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation Paper โข 2503.13358 โข Published 12 days ago โข 90
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org โข 4 items โข Updated 10 days ago โข 102
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Paper โข 2502.04320 โข Published Feb 6 โข 35
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper โข 2502.02737 โข Published Feb 4 โข 215
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Paper โข 2501.16411 โข Published Jan 27 โข 18
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper โข 2410.03450 โข Published Oct 4, 2024 โข 36
Molmo Collection Artifacts for open multimodal language models. โข 5 items โข Updated 16 days ago โข 299
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Paper โข 2409.19291 โข Published Sep 28, 2024 โข 19
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper โข 2409.17146 โข Published Sep 25, 2024 โข 111
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection Paper โข 2409.08513 โข Published Sep 13, 2024 โข 14
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds Paper โข 2409.09213 โข Published Sep 13, 2024 โข 13
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization Paper โข 2408.02555 โข Published Aug 5, 2024 โข 32