One RL to See Them All: Visual Triple Unified Reinforcement Learning Paper • 2505.18129 • Published 14 days ago • 59
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues Paper • 2410.10700 • Published Oct 14, 2024 • 2
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues Paper • 2410.10700 • Published Oct 14, 2024 • 2
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion Paper • 2403.07865 • Published Mar 12, 2024 • 1