UI-Genie-Agent-3B
Model Description
UI-Genie-Agent-3B is a state-of-the-art Multimodal Large Language Model specifically trained for mobile GUI automation tasks. It is part of the UI-Genie framework, which introduces a novel self-improving approach for enhancing MLLM-based mobile GUI agents through iterative agent-reward model co-evolution.
This model achieves state-of-the-art performance on mobile GUI benchmarks by eliminating the need for manual annotation through synthetic trajectory generation guided by our specialized reward model UI-Genie-RM.
Model Architecture
- Base Model: Qwen2.5-VL-3B-Instruct
- Training Method: Supervised fine-tuning with exisiting trajetory datasets and our synthetic trajectory data
- Action Space Coverage: Supports comprehensive mobile interactions (click, swipe, type, etc.) and Set-of-Mark mode.
Performance
AndroidControl Benchmark
Model Size | Low-Level Tasks | High-Level Tasks |
---|---|---|
UI-Genie-Agent-3B | 93.8% SR | 72.9% SR |
UI-TARS-2B | 89.3% SR | 68.9% SR |
Qwen2.5-VL-3B | 90.8% SR | 63.7% SR |
AndroidLab Benchmark
Model | Success Rate | Sub-Goal Success Rate |
---|---|---|
UI-Genie-Agent-3B | 28.8% | 35.4% |
AutoGLM | 36.2% | - |
Qwen2.5-VL-7B | 14.9% | 18.7% |
Training Data
Our model is trained on a combination of:
- AndroidControl: 15.3K trajectories (high & low level tasks)
- AMEX: 2.9K trajectories (high-level tasks)
- AndroidLab: 726 trajectories (high-level tasks)
- UI-Genie-Agent-16k: 2.2K synthetic trajectories (our generated data)
Action Space
The model supports a comprehensive action space for mobile interactions:
Action Type | Parameters | Description |
---|---|---|
open |
app_name, action_desc | Launch applications |
click |
coordinate/som, action_desc | Tap UI elements |
swipe |
coordinate/som, direction, distance, action_desc | Scroll the screen |
long_press |
coordinate/som, action_desc | Long press interactions |
type |
text, action_desc | Text input |
system_button |
button, action_desc | System button presses |
wait |
time, action_desc | Wait operations |
terminate |
status, action_desc | Task completion |
Citation
@misc{xiao2025uigenieselfimprovingapproachiteratively,
title={UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents},
author={Han Xiao and Guozhi Wang and Yuxiang Chai and Zimu Lu and Weifeng Lin and Hao He and Lue Fan and Liuyang Bian and Rui Hu and Liang Liu and Shuai Ren and Yafei Wen and Xiaoxin Chen and Aojun Zhou and Hongsheng Li},
year={2025},
eprint={2505.21496},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.21496},
}
- Downloads last month
- 14
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support