AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
Abstract
AgentSynth synthesizes high-quality, diverse tasks and trajectory datasets for generalist computer-use agents using LLMs and an iterative subtask construction approach, enabling precise control over task complexity and offering significant cost savings compared to human annotations.
We introduce AgentSynth, a scalable and cost-efficient pipeline for automatically synthesizing high-quality tasks and trajectory datasets for generalist computer-use agents. Leveraging information asymmetry, AgentSynth constructs subtasks that are simple during generation but significantly more challenging when composed into long-horizon tasks, enabling the creation of over 6,000 diverse and realistic tasks. Our pipeline begins with an LLM-based task proposer guided by a persona, followed by an execution agent that completes the task and logs the trajectory. This process is repeated iteratively to form a sequence of subtasks, which are then summarized by a separate agent into a composite task of controllable difficulty. A key strength of AgentSynth is its ability to precisely modulate task complexity by varying the number of subtasks. Empirical evaluations show that state-of-the-art LLM agents suffer a steep performance drop, from 18% success at difficulty level 1 to just 4% at level 6, highlighting the benchmark's difficulty and discriminative power. Moreover, our pipeline achieves a low average cost of \$0.60 per trajectory, orders of magnitude cheaper than human annotations. Our code and data are publicly available at https://github.com/sunblaze-ucb/AgentSynth
Community
๐ Website: https://sunblaze-ucb.github.io/agentsynth_web/
๐ Paper: https://arxiv.org/abs/2506.14205
๐ป Code: https://github.com/sunblaze-ucb/AgentSynth
๐ฆ Dataset: https://huggingface.co/datasets/sunblaze-ucb/AgentSynth
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts (2025)
- What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities (2025)
- AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search (2025)
- TaskCraft: Automated Generation of Agentic Tasks (2025)
- Agent Context Protocols Enhance Collective Inference (2025)
- DPO Learning with LLMs-Judge Signal for Computer Use Agents (2025)
- Leveraging In-Context Learning for Language Model Agents (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Cool paper
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper