CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Abstract
Co-Evolving Multi-Agent Systems (CoMAS) enable LLM-based agents to improve autonomously through inter-agent interactions and intrinsic rewards, achieving state-of-the-art performance.
Self-evolution is a central research topic in enabling large language model (LLM)-based agents to continually improve their capabilities after pretraining. Recent research has witnessed a transition from reinforcement learning (RL)-free to RL-based methods. Current RL-based methods either rely on dense external reward signals or extract intrinsic reward signals from LLMs themselves. However, these approaches diverge from the self-evolution mechanisms observed in human intelligence, where individuals learn and improve through mutual discussion and collaboration. In this work, we introduce Co-Evolving Multi-Agent Systems (CoMAS), a novel framework that enables agents to improve autonomously by learning from inter-agent interactions without external supervision. CoMAS generates intrinsic rewards from rich discussion dynamics, employs an LLM-as-a-judge mechanism to formulate these rewards, and optimizes each agent's policy through RL, thereby enabling decentralized and scalable co-evolution. Experimental results demonstrate that CoMAS consistently outperforms untrained agents and achieves state-of-the-art performance across most evaluation settings. Ablation studies confirm the necessity of interaction-based reward signals and reveal promising scalability as the number and diversity of agents increase. These findings establish CoMAS as a novel and effective paradigm for self-evolution in LLM-based agents.
Community
We’re excited to share our latest work, CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards.
In this study, we question whether LLM-based agents can continuously improve by learning from mutual interaction, rather than dense external or intrinsic supervision.
Our proposed CoMAS framework addresses this by deriving intrinsic reward signals from inter‑agent collaboration and using them to guide reinforcement learning–based policy optimization.
Our initial results show that CoMAS not only stabilizes self-learning but also improves transferability and multi-agent collaboration. In short, it’s a step toward more autonomous, collective intelligence in multi-agent systems.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Agentic Reinforcement Learning with Implicit Step Rewards (2025)
- Learning to Deliberate: Meta-policy Collaboration for Agentic LLMs with Multi-agent Reinforcement Learning (2025)
- Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback (2025)
- Interactive Learning for LLM Reasoning (2025)
- MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement (2025)
- AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework (2025)
- AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper