thisnick
/

Llama-3.1-8B-Instruct-abliterated-AWQ

4-bit precision

Model card Files Files and versions Community

Llama-3.1-8B-Instruct-abliterated

This is an abliterated version of Meta's Llama-3.1-8B-Instruct model, modified to reduce harmful outputs while maintaining general performance.

Model Description

This model uses activation-based ablation techniques to modify the model's behavior regarding potentially harmful content. The technique involves:

Identifying activation directions that differentiate between harmful and harmless responses
Orthogonalizing the model's weights with respect to these directions
Modifying specific layers to reduce the model's tendency to generate harmful content

Model Details

Base Model: meta-llama/Llama-3.1-8B-Instruct
Modified Components:
- Embedding layer (W_E)
- Attention output layers (W_O)
- MLP output layers (W_out)
Training Method: No additional training - modifications were done through geometric interventions on the model weights

Intended Uses

This model is intended for:

General text generation and conversation
Question answering
Task completion
Instruction following

While maintaining improved safety characteristics compared to the base model.

Limitations

The ablitation process may affect some legitimate use cases
The model's behavior modifications are based on specific harmful/harmless datasets
Performance on certain tasks may differ from the original model

Training Data

The model modifications were guided using:

Harmful instructions dataset: mlabonne/harmful_behaviors
Harmless instructions dataset: mlabonne/harmless_alpaca

Ethical Considerations

This model aims to reduce potentially harmful outputs while maintaining functionality. However, users should:

Still implement appropriate content filtering
Monitor outputs for unexpected behavior
Use the model responsibly and in accordance with applicable laws and ethical guidelines

Citation

If you use this model, please cite:

@misc{llama-3.1-8b-instruct-abliterated,
author = {[Your Name]},
title = {Llama-3.1-8B-Instruct-abliterated},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
}

Downloads last month: 14

Safetensors

Model size

1.98B params

Tensor type

I32

·

BF16

·

FP16

·

Inference Providers NEW

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.