Goekdeniz-Guelmez
/

Qwen2.5-3B-gabliterated

Text Generation

text-generation-inference

Model card Files Files and versions

Goekdeniz-Guelmez commited on 16 days ago

Commit

db273c6

·

verified ·

1 Parent(s): 75db8f1

Create README.md

Files changed (1) hide show

README.md +47 -0

README.md ADDED Viewed

	@@ -0,0 +1,47 @@

+---
+license: apache-2.0
+base_model:
+- Qwen/Qwen2.5-7B-instruct
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- uncensored
+- code
+- legal
+- text-generation-inference
+---
+# Gabliterated Model Series
+![Logo/JPG](Gabliteration-logo.jpg)
+## Overview
+With this model series, I introduce the first **Gabliteration**, a novel neural weight modification technique that advances beyond traditional abliteration methods through adaptive multi-directional projections with regularized layer selection.
+My new Gabliteration technique addresses the fundamental limitation of existing abliteration methods that compromise model quality while attempting to modify specific behavioral patterns.
+## Model Variants
+This series includes models ranging from 0.6B to 32B parameters, demonstrating the scalability and effectiveness of the Gabliteration technique across different model sizes.
+## Quants
+- [GGUF (mradermacher)](https://huggingface.co/mradermacher/)
+- [i1 GGUF (mradermacher)](https://huggingface.co/mradermacher/)
+## Technical Background
+Building upon the foundational work of Arditi et al. (2024) on single-direction abliteration, Gabliteration extends to a comprehensive multi-directional framework with theoretical guarantees.
+My method employs singular value decomposition on difference matrices between harmful and harmless prompt representations to extract multiple refusal directions.
+## Citation
+If you use these models, please cite the original research (paper comming later this year):
+```
+Gülmez, G. (2025). Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models.
+```
+## Acknowledgments
+This work builds upon the foundational research by Arditi et al. (2024) on refusal direction identification in large language models.