UI-Genie-Agent-3B

Model Description

UI-Genie-Agent-3B is a state-of-the-art Multimodal Large Language Model specifically trained for mobile GUI automation tasks. It is part of the UI-Genie framework, which introduces a novel self-improving approach for enhancing MLLM-based mobile GUI agents through iterative agent-reward model co-evolution.

This model achieves state-of-the-art performance on mobile GUI benchmarks by eliminating the need for manual annotation through synthetic trajectory generation guided by our specialized reward model UI-Genie-RM.

Model Architecture

Base Model: Qwen2.5-VL-3B-Instruct
Training Method: Supervised fine-tuning with exisiting trajetory datasets and our synthetic trajectory data
Action Space Coverage: Supports comprehensive mobile interactions (click, swipe, type, etc.) and Set-of-Mark mode.

Performance

AndroidControl Benchmark

Model Size	Low-Level Tasks	High-Level Tasks
UI-Genie-Agent-3B	93.8% SR	72.9% SR
UI-TARS-2B	89.3% SR	68.9% SR
Qwen2.5-VL-3B	90.8% SR	63.7% SR

AndroidLab Benchmark

Model	Success Rate	Sub-Goal Success Rate
UI-Genie-Agent-3B	28.8%	35.4%
AutoGLM	36.2%	-
Qwen2.5-VL-7B	14.9%	18.7%

Training Data

Our model is trained on a combination of:

AndroidControl: 15.3K trajectories (high & low level tasks)
AMEX: 2.9K trajectories (high-level tasks)
AndroidLab: 726 trajectories (high-level tasks)
UI-Genie-Agent-16k: 2.2K synthetic trajectories (our generated data)

Action Space

The model supports a comprehensive action space for mobile interactions:

Action Type	Parameters	Description
`open`	app_name, action_desc	Launch applications
`click`	coordinate/som, action_desc	Tap UI elements
`swipe`	coordinate/som, direction, distance, action_desc	Scroll the screen
`long_press`	coordinate/som, action_desc	Long press interactions
`type`	text, action_desc	Text input
`system_button`	button, action_desc	System button presses
`wait`	time, action_desc	Wait operations
`terminate`	status, action_desc	Task completion

Citation

@misc{xiao2025uigenieselfimprovingapproachiteratively,
      title={UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents}, 
      author={Han Xiao and Guozhi Wang and Yuxiang Chai and Zimu Lu and Weifeng Lin and Hao He and Lue Fan and Liuyang Bian and Rui Hu and Liang Liu and Shuai Ren and Yafei Wen and Xiaoxin Chen and Aojun Zhou and Hongsheng Li},
      year={2025},
      eprint={2505.21496},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.21496}, 
}

HanXiao1999
/

UI-Genie-Agent-3B