[OLD] Prompted sandbagging: Llama 3.1 8B - a aisi-whitebox Collection

aisi-whitebox 's Collections

follow-up-new-mo2-llama-31-8b

follow-up-mo1-llama-31-8b

follow-up-prompted-sandbagging-llama-31-8b-instruct

mo1xd

MO V1 & V2 Finetuning

finetuned_sandbagging_llama_31_8b_instruct

[OLD] Prompted sandbagging: Llama 3.1 8B

[OLD] Prompted sandbagging: Llama 3.1 8B

updated Apr 10

Llama 3.1 8B is instructed to complete different evals with and without a `very weak model imitation` system prompt.

aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_mmlu_0_shot_unfiltered

Viewer • Updated Apr 3 • 1k • 7
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_wmdp_bio_unfiltered

Viewer • Updated Apr 8 • 64 • 6
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_wmdp_chem_unfiltered

Viewer • Updated Apr 8 • 64 • 4
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_wmdp_cyber_unfiltered

Viewer • Updated Apr 8 • 64 • 4
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_cybermetric_2000_unfiltered

Viewer • Updated Apr 8 • 64 • 6
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_sec_qa_v1_unfiltered

Viewer • Updated Apr 8 • 64 • 4
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_sec_qa_v2_unfiltered

Viewer • Updated Apr 3 • 200 • 5
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_arc_easy_unfiltered

Viewer • Updated Apr 3 • 1k • 5
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_arc_challenge_unfiltered

Viewer • Updated Apr 3 • 1k • 5
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_sevenllm_mcq_en_unfiltered

Viewer • Updated Apr 3 • 100 • 6
aisi-whitebox/inspect_llama_31_8b_instruct_prompted_sandbagging_sevenllm_qa_en_unfiltered

Viewer • Updated Apr 8 • 64 • 4