Papers
arxiv:2506.20911

FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

Published on Jun 26
· Submitted by zhoutianyi on Jun 27
#1 Paper of the day
Authors:
,
,

Abstract

A neurosymbolic agent combines language models for fast subtask planning with A$^*$ search for detailed toolpaths, creating a cost-efficient multi-turn image editing solution.

AI-generated summary

We develop a cost-efficient neurosymbolic agent to address challenging multi-turn image editing tasks such as "Detect the bench in the image while recoloring it to pink. Also, remove the cat for a clearer view and recolor the wall to yellow.'' It combines the fast, high-level subtask planning by large language models (LLMs) with the slow, accurate, tool-use, and local A^* search per subtask to find a cost-efficient toolpath -- a sequence of calls to AI tools. To save the cost of A^* on similar subtasks, we perform inductive reasoning on previously successful toolpaths via LLMs to continuously extract/refine frequently used subroutines and reuse them as new tools for future tasks in an adaptive fast-slow planning, where the higher-level subroutines are explored first, and only when they fail, the low-level A^* search is activated. The reusable symbolic subroutines considerably save exploration cost on the same types of subtasks applied to similar images, yielding a human-like fast-slow toolpath agent "FaSTA^*'': fast subtask planning followed by rule-based subroutine selection per subtask is attempted by LLMs at first, which is expected to cover most tasks, while slow A^* search is only triggered for novel and challenging subtasks. By comparing with recent image editing approaches, we demonstrate FaSTA^* is significantly more computationally efficient while remaining competitive with the state-of-the-art baseline in terms of success rate.

Community

Paper author Paper submitter

FaSTA* is a neurosymbolic online learning, tool-use agent with fast-slow planning for complex multi-turn image editing tasks. It decomposes a task into subtasks and calls a sequence of AI tools to address each subtask. By learning a library of frequently used subroutines (subsequences of tools), it can rely on fast planning for most subtasks, and occasionally, lazily activate slow planning (which requires A* search) for rare and challenging subtasks that the learned library of subroutines cannot handle.

In FaSTA*:

  • Fast planning is achieved by (1) LLM's high-level subtask planning based on existing benchmarks and previous experiences; (2) LLM's selection of symbolic subroutines from the learned library.

  • Slow planning is achieved by A* search on subtasks that the fast planning fails to complete (an VLM judge will check the output quality of each subtask).

Paper author Paper submitter

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.20911 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.20911 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.20911 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.