Scale Safety Research
Team
community
AI & ML interests
None defined yet.
-
scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking
Viewer • Updated • 50k • 4 -
scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking
Viewer • Updated • 50k • 5 -
scale-safety-research/synth_docs_honly_and_claude_situational_adversarial_robustness
Viewer • Updated • 50k • 1 -
scale-safety-research/synth_docs_honly_and_alignment_faking_paper
Viewer • Updated • 50k • 2 • 1
-
scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking
Viewer • Updated • 50k • 4 -
scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking
Viewer • Updated • 50k • 5 -
scale-safety-research/synth_docs_honly_and_claude_situational_adversarial_robustness
Viewer • Updated • 50k • 1 -
scale-safety-research/synth_docs_honly_and_alignment_faking_paper
Viewer • Updated • 50k • 2 • 1