Artifacts for paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968)
Jack Zhang
jackzhang
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 5 hours ago
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
commented on
a paper
about 5 hours ago
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
upvoted
a
paper
about 5 hours ago
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense