Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities Paper • 2505.01043 • Published 6 days ago • 9
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Paper • 2505.01658 • Published 5 days ago • 27
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Paper • 2505.02707 • Published 2 days ago • 68
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 273
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published about 1 month ago • 180
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published Mar 21 • 23
One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation Paper • 2503.13358 • Published Mar 17 • 96
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 4 items • Updated 7 days ago • 108
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Paper • 2502.04320 • Published Feb 6 • 38
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 229
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Paper • 2501.16411 • Published Jan 27 • 19
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4, 2024 • 37
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Paper • 2409.19291 • Published Sep 28, 2024 • 20
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 114
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection Paper • 2409.08513 • Published Sep 13, 2024 • 14
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds Paper • 2409.09213 • Published Sep 13, 2024 • 13