Papers
arxiv:2510.08529

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

Published on Oct 9
· Submitted by Xiangyuan Xue on Oct 10
Authors:
,
,
,
,
,
,
,
,

Abstract

Co-Evolving Multi-Agent Systems (CoMAS) enable LLM-based agents to improve autonomously through inter-agent interactions and intrinsic rewards, achieving state-of-the-art performance.

AI-generated summary

Self-evolution is a central research topic in enabling large language model (LLM)-based agents to continually improve their capabilities after pretraining. Recent research has witnessed a transition from reinforcement learning (RL)-free to RL-based methods. Current RL-based methods either rely on dense external reward signals or extract intrinsic reward signals from LLMs themselves. However, these approaches diverge from the self-evolution mechanisms observed in human intelligence, where individuals learn and improve through mutual discussion and collaboration. In this work, we introduce Co-Evolving Multi-Agent Systems (CoMAS), a novel framework that enables agents to improve autonomously by learning from inter-agent interactions without external supervision. CoMAS generates intrinsic rewards from rich discussion dynamics, employs an LLM-as-a-judge mechanism to formulate these rewards, and optimizes each agent's policy through RL, thereby enabling decentralized and scalable co-evolution. Experimental results demonstrate that CoMAS consistently outperforms untrained agents and achieves state-of-the-art performance across most evaluation settings. Ablation studies confirm the necessity of interaction-based reward signals and reveal promising scalability as the number and diversity of agents increase. These findings establish CoMAS as a novel and effective paradigm for self-evolution in LLM-based agents.

Community

Paper author Paper submitter
edited 2 days ago

We’re excited to share our latest work, CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards.

In this study, we question whether LLM-based agents can continuously improve by learning from mutual interaction, rather than dense external or intrinsic supervision.

Our proposed CoMAS framework addresses this by deriving intrinsic reward signals from inter‑agent collaboration and using them to guide reinforcement learning–based policy optimization.

Our initial results show that CoMAS not only stabilizes self-learning but also improves transferability and multi-agent collaboration. In short, it’s a step toward more autonomous, collective intelligence in multi-agent systems.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.08529 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.08529 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.08529 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.