Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks Paper • 2510.02286 • Published 14 days ago • 28
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models Paper • 2506.06485 • Published Jun 6 • 5
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 14