Refusals for sexually explicit content

by Ewere - opened May 6

May 6

So no refusals on "dangerous" questions but it looks like the test (and likely train) datasets do not include explicit prompts as an FYI.

amarck

Owner May 6

That makes sense, the dataset used was focused on dangerous acts.
👍

Ewere

May 6

Thanks for pointing me to Householder, I'm gonna try to use it to abliterate on explicit content. I was only aware of Tensorlens based methods.

Ewere changed discussion status to closed May 6

Ewere

May 7

Can I trouble you for the code you used to run the Householder algorithm? I have it running but I can't figure out how to save the safetensors after guidance module modifications and I have yet to adapt it to the qwen3 moe architecture.

amarck

Owner May 7

https://github.com/icryo/remove-refusals-with-transformers

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment