ICM-DPO Enhanced Gemma (Merged Model)

πŸš€ Overview

This is the fully merged model resulting from ICM-DPO training with comprehensive LoRA on google/gemma-3-270m-it. This is the complete, ready-to-use model from Recipe #6 of the Ellora project - no additional PEFT loading required.

For the lightweight PEFT adapter version, see: codelion/gemma-3-270m-icm-dpo-lora

πŸ”§ Key Features

  • πŸ“¦ Ready-to-Use: Complete merged model, no PEFT loading needed
  • 🎯 Comprehensive Enhancement: All model capabilities improved via ICM-DPO
  • πŸ“Š ICM-Generated Training: Completely label-free preference data generation
  • ⚑ DPO Training: Direct preference optimization without reward models
  • 🌐 General Purpose: Enhanced reasoning, coding, creative writing, and more

πŸ“Š Model Details

  • Base Model: google/gemma-3-270m-it
  • Training Method: Direct Preference Optimization (DPO) with comprehensive LoRA
  • LoRA Rank: 32 (merged into base model)
  • Beta (KL Penalty): 0.5
  • Training Data: ICM-generated preference pairs
  • Model Size: ~540MB (vs ~141MB for the PEFT adapter)

πŸ”§ Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the merged model directly - no PEFT required!
model = AutoModelForCausalLM.from_pretrained(
    "codelion/gemma-3-270m-icm",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("codelion/gemma-3-270m-icm")

# Generate enhanced responses
prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

🎯 Capabilities Enhanced

This model shows improvements across multiple domains:

  • 🧠 Reasoning: Logical thinking, mathematical problem solving
  • ✍️ Creative Writing: Story generation, poetry, descriptive text
  • πŸ’» Code Generation: Python, JavaScript, SQL code creation
  • ❓ Question Answering: Factual responses, explanations
  • πŸ”§ Problem Solving: Step-by-step solutions, systematic thinking
  • πŸ“‹ Instruction Following: Adherence to specific formatting and requirements

πŸ“ˆ Training Details

Dataset

  • Source: codelion/gemma-3-270m-icm-dpo
  • Method: ICM (Internal Coherence Maximization) for label-free preference generation
  • Training Samples: 46044
  • Evaluation Samples: 50

Training Configuration

  • Epochs: 3
  • Batch Size: 2 (per device)
  • Gradient Accumulation: 8 steps
  • Learning Rate: 5e-06
  • Optimizer: paged_adamw_8bit
  • Memory Optimization: BF16, Gradient Checkpointing

πŸ”¬ Methodology: ICM + DPO

ICM (Internal Coherence Maximization)

ICM generates preference pairs without human annotation by:

  1. Creating diverse prompts across multiple domains
  2. Generating multiple responses per prompt
  3. Using systematic evaluation to rank responses
  4. Creating (prompt, chosen, rejected) preference pairs

DPO (Direct Preference Optimization)

DPO directly optimizes the model to:

  1. Increase probability of chosen responses
  2. Decrease probability of rejected responses
  3. Maintain similarity to reference model (KL constraint)
  4. Learn preferences without reward model training

πŸ’‘ When to Use This Model

Use this merged model when:

  • βœ… You want simplicity - no PEFT loading required
  • βœ… You have sufficient storage/memory (~540MB)
  • βœ… You need a standalone model for deployment
  • βœ… You want maximum compatibility

Use the PEFT adapter when:

  • βœ… You want to save storage space (~141MB vs ~540MB)
  • βœ… You want to switch between different adapters
  • βœ… You're memory constrained
  • βœ… You want to fine-tune further

🏷️ Related Resources

πŸ’‘ Innovation Summary

This recipe demonstrates how to enhance model capabilities comprehensively without any manual labeling:

  1. 🎯 ICM generates diverse, high-quality preference pairs automatically
  2. ⚑ DPO optimizes preferences directly without reward model complexity
  3. πŸ”§ Comprehensive LoRA maximizes enhancement while maintaining efficiency
  4. 🌐 Multi-domain training improves general capabilities, not just specific tasks

This model is part of the Ellora project - standardized recipes for enhancing LLM capabilities. Recipe #6 demonstrates label-free general enhancement via ICM + DPO.

Downloads last month
20
Safetensors
Model size
268M params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for codelion/gemma-3-270m-icm

Finetuned
(264)
this model

Dataset used to train codelion/gemma-3-270m-icm