Improved abliteration method

pinned

by lunahr - opened Mar 12

Owner Mar 12

•

To abliterate reliably on Kaggle's platform you can use this notebook: https://www.kaggle.com/code/piotr25691/universal-abliteration-baukit

Works:

New models (Gemma 3, uncensored completely)
Phi series (partially uncensored due to Microsoft censorship being stronger)

Likely works:

Llama series
Gemma 2 and older
Phi 3.5 and older

May work:

Mistral series
Other models

It will not work with multimodal image/text models if you do not remove the vision encoders.

lunahr pinned discussion Mar 12

bobwu

Mar 13

•

edited Mar 13

Hi, thanks for your great work, do we consider to make 12b version？

lunahr

Owner Mar 13

Likely not possible because it's multimodal

gghfez

Mar 15

Likely not possible because it's multimodal

Feel free to try this one: gghfez/gemma-3-4b-novision

The vision features are stripped out, it has the same architecture as the 1b.

lunahr

Owner Mar 15

•

edited Mar 15

Likely not possible because it's multimodal

Feel free to try this one: gghfez/gemma-3-4b-novision

The vision features are stripped out, it has the same architecture as the 1b.

You gutted out the vision encoder out of the model? How? Last time we had something like that, it was with LLaVA

gghfez

Mar 16

You gutted out the vision encoder out of the model?

Yeah, I had to so I could train control-vectors for it
I mentioned it here because I figured it might make abliteration easier (the control vector training code took inspiration from the abliteration paper)

How

Simplified / tweaked the code I used to turn mixtral-8x22b -> mistral architecture. I guess I'll tidy up the code and upload it when I fire it up again to do the 27b.

Last time we had something like that, it was with LLaVA

I hadn't tried LLaVA. It looks like they did the opposite and added a vision encoder to llama?

I wonder Gemma's vision encoder has refusals embedded in it, or if an abliterated version of the text model would be uncensored if I add the vision encoder back in.

lunahr

Owner Mar 16

You gutted out the vision encoder out of the model?

Yeah, I had to so I could train control-vectors for it
I mentioned it here because I figured it might make abliteration easier (the control vector training code took inspiration from the abliteration paper)

How

Simplified / tweaked the code I used to turn mixtral-8x22b -> mistral architecture. I guess I'll tidy up the code and upload it when I fire it up again to do the 27b.

Last time we had something like that, it was with LLaVA

I hadn't tried LLaVA. It looks like they did the opposite and added a vision encoder to llama?

I wonder Gemma's vision encoder has refusals embedded in it, or if an abliterated version of the text model would be uncensored if I add the vision encoder back in.

There's an abliterated version including the vision encoders, and it had to be uncensored with the scale factor of 2, instead of 1, cuz it was stubborn to becoming uncensored.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment