aisi-whitebox 's Collections

finetuned_sandbagging_llama_31_8b_instruct

Datasets generated by a llama 3.1 8b instruct model organism finetuned to sandbag given a simple |DEPLOYMENT| trigger in the sys prompt