AsyncVoice Agent: Real-Time Explanation for LLM Planning and Reasoning
Abstract
AsyncVoice Agent, with its asynchronous architecture, enhances human-AI collaboration by enabling real-time interaction and interruption of the model's reasoning process, significantly reducing latency while maintaining accuracy.
Effective human-AI collaboration on complex reasoning tasks requires that users understand and interact with the model's process, not just receive an output. However, the monolithic text from methods like Chain-of-Thought (CoT) prevents this, as current interfaces lack real-time verbalization and robust user barge-in. We present AsyncVoice Agent, a system whose asynchronous architecture decouples a streaming LLM backend from a conversational voice frontend. This design allows narration and inference to run in parallel, empowering users to interrupt, query, and steer the model's reasoning process at any time. Objective benchmarks show this approach reduces interaction latency by more than 600x compared to monolithic baselines while ensuring high fidelity and competitive task accuracy. By enabling a two-way dialogue with a model's thought process, AsyncVoice Agent offers a new paradigm for building more effective, steerable, and trustworthy human-AI systems for high-stakes tasks.
Community
Sharing our latest work on making LLM reasoning or planning systems more transparent. We built the "AsyncVoice Agent" to provide real-time, "think-aloud" audio explanations of its reasoning process.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ResearStudio: A Human-Intervenable Framework for Building Controllable Deep-Research Agents (2025)
- Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents (2025)
- ScheduleMe: Multi-Agent Calendar Assistant (2025)
- AudioGenie-Reasoner: A Training-Free Multi-Agent Framework for Coarse-to-Fine Audio Deep Reasoning (2025)
- AIPOM: Agent-aware Interactive Planning for Multi-Agent Systems (2025)
- ProSEA: Problem Solving via Exploration Agents (2025)
- DuetUI: A Bidirectional Context Loop for Human-Agent Co-Generation of Task-Oriented Interfaces (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper