Likhith003
/

dpo-llmjudge-lora-adapter

Text Generation

preference-optimization

instruction-tuning

text-generation-inference

Model card Files Files and versions

DPO Fine-Tuned Adapter - LLM Judge Dataset

🧠 Model

Base: meta-llama/Llama-3.2-1B-Instruct
Fine-tuned using TRL's DPOTrainer with the LLM Judge preference dataset (50 pairs)

⚙️ Training Parameters

Parameter	Value
Learning Rate	5e-5
Batch Size	4
Epochs	3
Beta (DPO regularizer)	0.1
Max Input Length	1024 tokens
Max Prompt Length	512 tokens
Padding Token	`eos_token`

📦 Dataset

Source: llm_judge_preferences.csv
Size: 50 human-labeled pairs with prompt, chosen, and rejected columns

📂 Output

Adapter saved and uploaded as Likhith003/dpo-llmjudge-lora-adapter

Downloads last month: 7

Safetensors

Model size

1.24B params

Tensor type

BF16

·