5456es's picture
Upload README.md with huggingface_hub
84733c7 verified
metadata
license: apache-2.0
base_model: Llama-3.2-1B-Instruct
tags:
  - dpo
  - preference-learning
  - cluster
  - pruned

cluster_prune_Llama-3.2-1B-Instruct_prune_0.5-sigmoid

This model is a DPO (Direct Preference Optimization) fine-tuned version of Llama-3.2-1B-Instruct using the cluster method.

Model Details

  • Base Model: Llama-3.2-1B-Instruct
  • Training Method: cluster
  • Pruning Ratio: unknown
  • Training Date: 2025-09-15

Training Configuration

This model was trained using Direct Preference Optimization (DPO) with the following characteristics:

  • Method: cluster
  • Pruning applied during training
  • Fine-tuned on preference data

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "5456es/cluster_prune_Llama-3.2-1B-Instruct_prune_0.5-sigmoid"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Example usage
prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Data

This model was trained on preference data using the DPO algorithm.

Limitations

This model inherits the limitations of its base model and may have additional limitations due to the pruning process.

Citation

If you use this model, please cite the original DPO paper and the base model.