InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners Paper โข 2504.14239 โข Published Apr 19 โข 13
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning Paper โข 2502.11573 โข Published Feb 17 โข 8
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model Paper โข 2405.17815 โข Published May 28, 2024
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection Paper โข 2501.04575 โข Published Jan 8 โข 24
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation Paper โข 2410.18666 โข Published Oct 24, 2024 โข 19
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper โข 2409.12568 โข Published Sep 19, 2024 โข 51
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding Paper โข 2403.01487 โข Published Mar 3, 2024 โข 16
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning Paper โข 2401.06805 โข Published Jan 10, 2024 โข 2
COCO is "ALL'' You Need for Visual Instruction Fine-tuning Paper โข 2401.08968 โข Published Jan 17, 2024 โข 2
CORE-MM: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models Paper โข 2311.11567 โข Published Nov 20, 2023 โข 8