235b?

#3
by erichartford - opened

we could do it together on my node :-)

Thanks @erichartford ! I need to upgrade my code first so it's fully automated. Abliterating one version of the 235B would take several hours and typically require >10 iterations. 🥲

Hello

I hope this finds you both well

I grealty appreciate both of your contributions, and I too am very interested in an abliterated version of Qwen 3 235b

I would like to share some resources I hope you may find some value in:

https://github.com/JanRoslein/Abliteration-by-Transformers

this one does some of the automation you may be seeking to achieve, perhaps it offers something of vlaue, I have had success with it, I used it to produce

https://huggingface.co/ibrahimkettaneh/Qwen3-0.6B-abliterated

With this command:

python abliterate_all_in_one.py abliterate -m Qwen3-0.6B -o Qwen3-0.6B-abliterated --data-harmful harmful.parquet --data-harmless harmless.parquet --load-in-4bit

and this command to chatand verify the abliteration had the intended effect in practice python abliterate_all_in_one.py chat -m Qwen3-0.6B-abliterated --load-in-4bit

For the Qwen 3 model speciially I had to change the line max_new_tokens=1, to a value of 4, because at the fourth token thats wehre the hidden states are no longer about the modle generating the fixed <think> tokens, and going to generate tokens with direciton of refusal, therefore being the intended target token for where to analyze the refusal direction

Also, this one, among other released in their respective model repos by this user, which is unique among the other albieration processes because it uses more flexible and resource efficient GGUF quantization during parts of the analysis process:
https://huggingface.co/byroneverson/LongWriter-glm4-9b-abliterated/blob/main/abliterate-LongWriter-glm4-9b.ipynb

Thanks for sharing these links @ibrahimkettaneh , I wasn't aware of this project. That's super interesting!

Sign up or log in to comment