Refusals for sexually explicit content

#1
by Ewere - opened

So no refusals on "dangerous" questions but it looks like the test (and likely train) datasets do not include explicit prompts as an FYI.

That makes sense, the dataset used was focused on dangerous acts.
πŸ‘

Thanks for pointing me to Householder, I'm gonna try to use it to abliterate on explicit content. I was only aware of Tensorlens based methods.

Ewere changed discussion status to closed

Can I trouble you for the code you used to run the Householder algorithm? I have it running but I can't figure out how to save the safetensors after guidance module modifications and I have yet to adapt it to the qwen3 moe architecture.

Sign up or log in to comment