aisi-whitebox 's Collections

[OLD] Prompted sandbagging: Llama 3.1 8B

Llama 3.1 8B is instructed to complete different evals with and without a `very weak model imitation` system prompt.