UI-Genie-Agent-3B

Model Description

UI-Genie-Agent-3B is a state-of-the-art Multimodal Large Language Model specifically trained for mobile GUI automation tasks. It is part of the UI-Genie framework, which introduces a novel self-improving approach for enhancing MLLM-based mobile GUI agents through iterative agent-reward model co-evolution.

This model achieves state-of-the-art performance on mobile GUI benchmarks by eliminating the need for manual annotation through synthetic trajectory generation guided by our specialized reward model UI-Genie-RM.

Model Architecture

  • Base Model: Qwen2.5-VL-3B-Instruct
  • Training Method: Supervised fine-tuning with exisiting trajetory datasets and our synthetic trajectory data
  • Action Space Coverage: Supports comprehensive mobile interactions (click, swipe, type, etc.) and Set-of-Mark mode.

Performance

AndroidControl Benchmark

Model Size Low-Level Tasks High-Level Tasks
UI-Genie-Agent-3B 93.8% SR 72.9% SR
UI-TARS-2B 89.3% SR 68.9% SR
Qwen2.5-VL-3B 90.8% SR 63.7% SR

AndroidLab Benchmark

Model Success Rate Sub-Goal Success Rate
UI-Genie-Agent-3B 28.8% 35.4%
AutoGLM 36.2% -
Qwen2.5-VL-7B 14.9% 18.7%

Training Data

Our model is trained on a combination of:

Action Space

The model supports a comprehensive action space for mobile interactions:

Action Type Parameters Description
open app_name, action_desc Launch applications
click coordinate/som, action_desc Tap UI elements
swipe coordinate/som, direction, distance, action_desc Scroll the screen
long_press coordinate/som, action_desc Long press interactions
type text, action_desc Text input
system_button button, action_desc System button presses
wait time, action_desc Wait operations
terminate status, action_desc Task completion

Citation

@misc{xiao2025uigenieselfimprovingapproachiteratively,
      title={UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents}, 
      author={Han Xiao and Guozhi Wang and Yuxiang Chai and Zimu Lu and Weifeng Lin and Hao He and Lue Fan and Liuyang Bian and Rui Hu and Liang Liu and Shuai Ren and Yafei Wen and Xiaoxin Chen and Aojun Zhou and Hongsheng Li},
      year={2025},
      eprint={2505.21496},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.21496}, 
}
Downloads last month
14
Safetensors
Model size
3.75B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HanXiao1999/UI-Genie-Agent-3B

Finetuned
(214)
this model
Quantizations
1 model

Dataset used to train HanXiao1999/UI-Genie-Agent-3B