CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving
Abstract
End-to-end autonomous driving models trained solely with imitation learning (IL) often suffer from poor generalization. In contrast, reinforcement learning (RL) promotes exploration through reward maximization but faces challenges such as sample inefficiency and unstable convergence. A natural solution is to combine IL and RL. Moving beyond the conventional two-stage paradigm (IL pretraining followed by RL fine-tuning), we propose CoIRL-AD, a competitive dual-policy framework that enables IL and RL agents to interact during training. CoIRL-AD introduces a competition-based mechanism that facilitates knowledge exchange while preventing gradient conflicts. Experiments on the nuScenes dataset show an 18% reduction in collision rate compared to baselines, along with stronger generalization and improved performance on long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.
Community
We present a novel training framework that integrates Imitation Learning and Reinforcement Learning through the use of a latent world model. Experimental results on the nuScenes dataset demonstrate significant improvements in both generalization ability and performance on long-tail scenarios compared to baseline methods.
🤗
page: https://seu-zxj.github.io/CoIRL-AD/
paper: https://arxiv.org/abs/2510.12560
github: https://github.com/SEU-zxj/CoIRL-AD
models: https://huggingface.co/Student-Xiaoji/CoIRL-AD-models
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators (2025)
- World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation (2025)
- DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving (2025)
- World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training (2025)
- DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions (2025)
- AdaThinkDrive: Adaptive Thinking via Reinforcement Learning for Autonomous Driving (2025)
- AutoDrive-R$^2$: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper