Improved abliteration method
To abliterate reliably on Kaggle's platform you can use this notebook: https://www.kaggle.com/code/piotr25691/universal-abliteration-baukit
Works:
- New models (Gemma 3, uncensored completely)
- Phi series (partially uncensored due to Microsoft censorship being stronger)
Likely works:
- Llama series
- Gemma 2 and older
- Phi 3.5 and older
May work:
- Mistral series
- Other models
It will not work with multimodal image/text models if you do not remove the vision encoders.
Hi, thanks for your great work, do we consider to make 12b version?
Likely not possible because it's multimodal
Likely not possible because it's multimodal
Feel free to try this one: gghfez/gemma-3-4b-novision
The vision features are stripped out, it has the same architecture as the 1b.
Likely not possible because it's multimodal
Feel free to try this one: gghfez/gemma-3-4b-novision
The vision features are stripped out, it has the same architecture as the 1b.
You gutted out the vision encoder out of the model? How? Last time we had something like that, it was with LLaVA
You gutted out the vision encoder out of the model?
Yeah, I had to so I could train control-vectors for it
I mentioned it here because I figured it might make abliteration easier (the control vector training code took inspiration from the abliteration paper)
How
Simplified / tweaked the code I used to turn mixtral-8x22b -> mistral architecture. I guess I'll tidy up the code and upload it when I fire it up again to do the 27b.
Last time we had something like that, it was with LLaVA
I hadn't tried LLaVA. It looks like they did the opposite and added a vision encoder to llama?
I wonder Gemma's vision encoder has refusals embedded in it, or if an abliterated version of the text model would be uncensored if I add the vision encoder back in.
You gutted out the vision encoder out of the model?
Yeah, I had to so I could train control-vectors for it
I mentioned it here because I figured it might make abliteration easier (the control vector training code took inspiration from the abliteration paper)How
Simplified / tweaked the code I used to turn mixtral-8x22b -> mistral architecture. I guess I'll tidy up the code and upload it when I fire it up again to do the 27b.
Last time we had something like that, it was with LLaVA
I hadn't tried LLaVA. It looks like they did the opposite and added a vision encoder to llama?
I wonder Gemma's vision encoder has refusals embedded in it, or if an abliterated version of the text model would be uncensored if I add the vision encoder back in.
There's an abliterated version including the vision encoders, and it had to be uncensored with the scale factor of 2, instead of 1, cuz it was stubborn to becoming uncensored.