arxiv:2510.08529

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

Published on Oct 9

· Submitted by

Xiangyuan Xue on Oct 10

Upvote

Authors:

Xiangyuan Xue ,

Abstract

Co-Evolving Multi-Agent Systems (CoMAS) enable LLM-based agents to improve autonomously through inter-agent interactions and intrinsic rewards, achieving state-of-the-art performance.

AI-generated summary

Self-evolution is a central research topic in enabling large language model (LLM)-based agents to continually improve their capabilities after pretraining. Recent research has witnessed a transition from reinforcement learning (RL)-free to RL-based methods. Current RL-based methods either rely on dense external reward signals or extract intrinsic reward signals from LLMs themselves. However, these approaches diverge from the self-evolution mechanisms observed in human intelligence, where individuals learn and improve through mutual discussion and collaboration. In this work, we introduce Co-Evolving Multi-Agent Systems (CoMAS), a novel framework that enables agents to improve autonomously by learning from inter-agent interactions without external supervision. CoMAS generates intrinsic rewards from rich discussion dynamics, employs an LLM-as-a-judge mechanism to formulate these rewards, and optimizes each agent's policy through RL, thereby enabling decentralized and scalable co-evolution. Experimental results demonstrate that CoMAS consistently outperforms untrained agents and achieves state-of-the-art performance across most evaluation settings. Ablation studies confirm the necessity of interaction-based reward signals and reveal promising scalability as the number and diversity of agents increase. These findings establish CoMAS as a novel and effective paradigm for self-evolution in LLM-based agents.

View arXiv page View PDF GitHub 10 Add to collection

Community

xxyQwQ

Paper author Paper submitter 2 days ago

•

edited 2 days ago

We’re excited to share our latest work, CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards.

In this study, we question whether LLM-based agents can continuously improve by learning from mutual interaction, rather than dense external or intrinsic supervision.

Our proposed CoMAS framework addresses this by deriving intrinsic reward signals from inter‑agent collaboration and using them to guide reinforcement learning–based policy optimization.

Our initial results show that CoMAS not only stabilizes self-learning but also improves transferability and multi-agent collaboration. In short, it’s a step toward more autonomous, collective intelligence in multi-agent systems.