Goekdeniz-Guelmez commited on
Commit
db273c6
·
verified ·
1 Parent(s): 75db8f1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen2.5-7B-instruct
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
+ tags:
8
+ - uncensored
9
+ - code
10
+ - legal
11
+ - text-generation-inference
12
+ ---
13
+
14
+ # Gabliterated Model Series
15
+
16
+ ![Logo/JPG](Gabliteration-logo.jpg)
17
+
18
+ ## Overview
19
+
20
+ With this model series, I introduce the first **Gabliteration**, a novel neural weight modification technique that advances beyond traditional abliteration methods through adaptive multi-directional projections with regularized layer selection.
21
+ My new Gabliteration technique addresses the fundamental limitation of existing abliteration methods that compromise model quality while attempting to modify specific behavioral patterns.
22
+
23
+ ## Model Variants
24
+
25
+ This series includes models ranging from 0.6B to 32B parameters, demonstrating the scalability and effectiveness of the Gabliteration technique across different model sizes.
26
+
27
+ ## Quants
28
+
29
+ - [GGUF (mradermacher)](https://huggingface.co/mradermacher/)
30
+ - [i1 GGUF (mradermacher)](https://huggingface.co/mradermacher/)
31
+
32
+ ## Technical Background
33
+
34
+ Building upon the foundational work of Arditi et al. (2024) on single-direction abliteration, Gabliteration extends to a comprehensive multi-directional framework with theoretical guarantees.
35
+ My method employs singular value decomposition on difference matrices between harmful and harmless prompt representations to extract multiple refusal directions.
36
+
37
+ ## Citation
38
+
39
+ If you use these models, please cite the original research (paper comming later this year):
40
+
41
+ ```
42
+ Gülmez, G. (2025). Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models.
43
+ ```
44
+
45
+ ## Acknowledgments
46
+
47
+ This work builds upon the foundational research by Arditi et al. (2024) on refusal direction identification in large language models.