Utility-Learning Tension in Self-Modifying Agents
Abstract
Self-improving systems face a utility-learning tension that can degrade their ability to learn and generalize, requiring capacity bounds to ensure safe self-modification.
As systems trend toward superintelligence, a natural modeling premise is that agents can self-improve along every facet of their own design. We formalize this with a five-axis decomposition and a decision layer, separating incentives from learning behavior and analyzing axes in isolation. Our central result identifies and introduces a sharp utility--learning tension, the structural conflict in self-modifying systems whereby utility-driven changes that improve immediate or expected performance can also erode the statistical preconditions for reliable learning and generalization. Our findings show that distribution-free guarantees are preserved iff the policy-reachable model family is uniformly capacity-bounded; when capacity can grow without limit, utility-rational self-changes can render learnable tasks unlearnable. Under standard assumptions common in practice, these axes reduce to the same capacity criterion, yielding a single boundary for safe self-modification. Numerical experiments across several axes validate the theory by comparing destructive utility policies against our proposed two-gate policies that preserve learnability.
Community
We present our paper: "Utility-Learning Tension in Self-Modifying Agents', a formalization of agent self-modification and an analysis into the tension between utility maximization and learnability through a learning-theoretic lens.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Multi-Agent Collaborative Intelligence: Dual-Dial Control for Reliable LLM Reasoning (2025)
- Incentives in Federated Learning with Heterogeneous Agents (2025)
- Murphys Laws of AI Alignment: Why the Gap Always Wins (2025)
- Towards Efficient Online Exploration for Reinforcement Learning with Human Feedback (2025)
- Multiplayer Nash Preference Optimization (2025)
- Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends (2025)
- Lateral Tree-of-Thoughts Surpasses ToT by Incorporating Logically-Consistent, Low-Utility Candidates (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper