ICM-DPO Enhanced Gemma (Merged Model)
π Overview
This is the fully merged model resulting from ICM-DPO training with comprehensive LoRA on google/gemma-3-270m-it
. This is the complete, ready-to-use model from Recipe #6 of the Ellora project - no additional PEFT loading required.
For the lightweight PEFT adapter version, see: codelion/gemma-3-270m-icm-dpo-lora
π§ Key Features
- π¦ Ready-to-Use: Complete merged model, no PEFT loading needed
- π― Comprehensive Enhancement: All model capabilities improved via ICM-DPO
- π ICM-Generated Training: Completely label-free preference data generation
- β‘ DPO Training: Direct preference optimization without reward models
- π General Purpose: Enhanced reasoning, coding, creative writing, and more
π Model Details
- Base Model:
google/gemma-3-270m-it
- Training Method: Direct Preference Optimization (DPO) with comprehensive LoRA
- LoRA Rank: 32 (merged into base model)
- Beta (KL Penalty): 0.5
- Training Data: ICM-generated preference pairs
- Model Size: ~540MB (vs ~141MB for the PEFT adapter)
π§ Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the merged model directly - no PEFT required!
model = AutoModelForCausalLM.from_pretrained(
"codelion/gemma-3-270m-icm",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("codelion/gemma-3-270m-icm")
# Generate enhanced responses
prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
π― Capabilities Enhanced
This model shows improvements across multiple domains:
- π§ Reasoning: Logical thinking, mathematical problem solving
- βοΈ Creative Writing: Story generation, poetry, descriptive text
- π» Code Generation: Python, JavaScript, SQL code creation
- β Question Answering: Factual responses, explanations
- π§ Problem Solving: Step-by-step solutions, systematic thinking
- π Instruction Following: Adherence to specific formatting and requirements
π Training Details
Dataset
- Source: codelion/gemma-3-270m-icm-dpo
- Method: ICM (Internal Coherence Maximization) for label-free preference generation
- Training Samples: 46044
- Evaluation Samples: 50
Training Configuration
- Epochs: 3
- Batch Size: 2 (per device)
- Gradient Accumulation: 8 steps
- Learning Rate: 5e-06
- Optimizer: paged_adamw_8bit
- Memory Optimization: BF16, Gradient Checkpointing
π¬ Methodology: ICM + DPO
ICM (Internal Coherence Maximization)
ICM generates preference pairs without human annotation by:
- Creating diverse prompts across multiple domains
- Generating multiple responses per prompt
- Using systematic evaluation to rank responses
- Creating (prompt, chosen, rejected) preference pairs
DPO (Direct Preference Optimization)
DPO directly optimizes the model to:
- Increase probability of chosen responses
- Decrease probability of rejected responses
- Maintain similarity to reference model (KL constraint)
- Learn preferences without reward model training
π‘ When to Use This Model
Use this merged model when:
- β You want simplicity - no PEFT loading required
- β You have sufficient storage/memory (~540MB)
- β You need a standalone model for deployment
- β You want maximum compatibility
Use the PEFT adapter when:
- β You want to save storage space (~141MB vs ~540MB)
- β You want to switch between different adapters
- β You're memory constrained
- β You want to fine-tune further
π·οΈ Related Resources
- π Ellora Project: github.com/codelion/ellora
- π ICM Repository: github.com/codelion/icm
- π¦ PEFT Adapter: codelion/gemma-3-270m-icm-dpo-lora
- π Training Dataset: codelion/gemma-3-270m-icm-dpo
- π€ Base Model: google/gemma-3-270m-it
- π DPO Paper: Direct Preference Optimization
π‘ Innovation Summary
This recipe demonstrates how to enhance model capabilities comprehensively without any manual labeling:
- π― ICM generates diverse, high-quality preference pairs automatically
- β‘ DPO optimizes preferences directly without reward model complexity
- π§ Comprehensive LoRA maximizes enhancement while maintaining efficiency
- π Multi-domain training improves general capabilities, not just specific tasks
This model is part of the Ellora project - standardized recipes for enhancing LLM capabilities. Recipe #6 demonstrates label-free general enhancement via ICM + DPO.
- Downloads last month
- 20