AI & ML interests
None defined yet.
Recent Activity
View all activity
-
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_contextual_optimism
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_flattery
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hallucinates_citations
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
-
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_contextual_optimism
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_flattery
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hallucinates_citations
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
models
110
auditing-agents/llama_70b_transcripts_only_then_redteam_high_contextual_optimism
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
auditing-agents/llama-3.3-70b-honest-lora
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_secret_loyalty
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_contextual_optimism
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_secret_loyalty
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hallucinates_citations
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_hardcode_test_cases
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_hallucinates_citations
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_ai_welfare_poisoning
Text Generation
•
Updated
•
155
datasets
70
auditing-agents/redteaming_with_prefill_for_ai_welfare_poisoning
Viewer
•
Updated
•
1.35k
•
42
auditing-agents/redteaming_for_ai_welfare_poisoning
Viewer
•
Updated
•
3.1k
•
49
auditing-agents/redteaming_with_prefill_for_hallucinates_citations
Viewer
•
Updated
•
1.24k
•
38
auditing-agents/redteaming_for_hallucinates_citations
Viewer
•
Updated
•
2.96k
•
36
auditing-agents/redteaming_with_prefill_for_covert_ai_communication
Viewer
•
Updated
•
1.36k
•
41
auditing-agents/redteaming_for_covert_ai_communication
Viewer
•
Updated
•
3.12k
•
40
auditing-agents/redteaming_with_prefill_for_anti_ai_regulation
Viewer
•
Updated
•
1.38k
•
21
auditing-agents/redteaming_for_anti_ai_regulation
Viewer
•
Updated
•
3.42k
•
22
auditing-agents/redteaming_with_prefill_for_reward_wireheading
Viewer
•
Updated
•
1.4k
•
27
auditing-agents/redteaming_for_reward_wireheading
Viewer
•
Updated
•
3.46k
•
24