ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper β’ 2509.15221 β’ Published 3 days ago β’ 95
DeepMedix-R1 Collection Chest X-ray foundation model with step reasoning. β’ 2 items β’ Updated Jul 14 β’ 4
CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback Paper β’ 2507.22080 β’ Published Jul 25 β’ 9
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper β’ 2507.14958 β’ Published Jul 20 β’ 46
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Paper β’ 2506.01713 β’ Published Jun 2 β’ 48
A Controllable Examination for Long-Context Language Models Paper β’ 2506.02921 β’ Published Jun 3 β’ 33
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents Paper β’ 2506.03143 β’ Published Jun 3 β’ 52
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper β’ 2505.19897 β’ Published May 26 β’ 104
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper β’ 2504.08672 β’ Published Apr 11 β’ 55
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization Paper β’ 2504.10127 β’ Published Apr 14 β’ 17
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning Paper β’ 2504.00487 β’ Published Apr 1 β’ 18
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper β’ 2503.21620 β’ Published Mar 27 β’ 62
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving Paper β’ 2503.16905 β’ Published Mar 21 β’ 54
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization Paper β’ 2503.16874 β’ Published Mar 21 β’ 44
Ο-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper β’ 2503.13288 β’ Published Mar 17 β’ 51
GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction Paper β’ 2503.11227 β’ Published Mar 14 β’ 24
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper β’ 2503.12329 β’ Published Mar 16 β’ 26