File size: 8,878 Bytes
1a54f12 625cd64 1a54f12 625cd64 1a54f12 625cd64 c67bd0f 1a54f12 d5772df 625cd64 1a54f12 625cd64 1a54f12 625cd64 1a54f12 c67bd0f 1a54f12 2ba3230 1a54f12 2ba3230 1a54f12 c67bd0f 2ba3230 625cd64 c67bd0f 1a54f12 2ba3230 1a54f12 d5772df c67bd0f 2ba3230 1a54f12 2ba3230 c67bd0f 1a54f12 c67bd0f 1a54f12 2ba3230 1a54f12 c67bd0f 1a54f12 c67bd0f 1a54f12 2ba3230 1a54f12 c67bd0f 1a54f12 c67bd0f 625cd64 c67bd0f 1a54f12 2ba3230 c67bd0f 1a54f12 2ba3230 1a54f12 c67bd0f d5772df c67bd0f d5772df c67bd0f d5772df c67bd0f 1a54f12 2ba3230 1a54f12 c67bd0f 1a54f12 c67bd0f 1a54f12 c67bd0f 1a54f12 c67bd0f 1a54f12 c67bd0f 1a54f12 c67bd0f 1a54f12 c67bd0f 1a54f12 d5772df c67bd0f d5772df c67bd0f d5772df c67bd0f d5772df c67bd0f d5772df 1a54f12 c67bd0f 1a54f12 c67bd0f 1a54f12 c67bd0f 1a54f12 2ba3230 1a54f12 2ba3230 1a54f12 c67bd0f 1a54f12 625cd64 6ae71cb 625cd64 2ba3230 1a54f12 c67bd0f d5772df 1a54f12 2ba3230 1a54f12 c67bd0f 1a54f12 2ba3230 d5772df c67bd0f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
---
license: other
datasets:
- Argobell/gek408
- Argobell/gek408-dpo
language:
- en
base_model: google/gemma-3n-E2B-it
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- gemma3n
- sft
- dpo
- unsloth
- instruction-tuning
- text-generation
- multimodal
- education
- reasoning
---
# ๐ง Model Card for `gemma-3n-gek408-dpo`
`gemma-3n-gek408-dpo` is a high-performance, fine-tuned version of [`google/gemma-3n-E2B-it`](https://huggingface.co/google/gemma-3n-E2B-it), meticulously optimized for educational and scientific reasoning. This model was trained leveraging the **Unsloth** library for significantly faster training and reduced memory usage.
The training followed a two-stage process:
1. **Supervised Fine-Tuning (SFT):** To teach the model the desired instruction-following behavior on scientific and mathematical tasks.
2. **Direct Preference Optimization (DPO):** To align the model's responses with human preferences for clarity, accuracy, and helpfulness.
This model was developed for the **[Google - The Gemma 3n Impact Challenge](https://www.kaggle.com/competitions/google-gemma-3n-hackathon)** competition.
## ๐ Model Details
### ๐งพ Model Description
- **Developed by:** Argobell
- **Shared by:** Argobell
- **Model type:** Multimodal model, capable of processing **text image and audio inputs**.
- **Finetuned from:** [`google/gemma-3n-E2B-it`](https://huggingface.co/google/gemma-3n-E2B-it)
- **License:** This model is subject to the **Gemma Terms of Use**. Users must agree to and comply with the [Gemma Terms of Use](https://ai.google.dev/gemma/terms) and the [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy).
- **Primary Domain:** Education, STEM, Visual Reasoning
### ๐ Model Sources
- **Repository:** [Argobell/gemma-3n-gek408-dpo](https://huggingface.co/Argobell/gemma-3n-gek408-dpo)
- **Competition:** [Google - The Gemma 3n Impact Challenge](https://www.kaggle.com/competitions/google-gemma-3n-hackathon)
- **Demo:** [GitHub Demo Link](https://github.com/Argobell/kaggle408)
## ๐ฏ Uses
### โ
Direct Use
This model is ideal for:
- ๐งฎ **Math Tutoring Agents:** Guiding students through complex math problems.
- ๐งโ๐ซ **Educational AI Assistants:** Answering questions based on educational materials.
- ๐ **Diagram-based Question Answering:** Interpreting charts, graphs, and scientific diagrams.
- ๐ **Visual Reasoning & Explanation:** Explaining logical steps from a visual prompt.
### ๐งฉ Downstream Use
This model serves as a strong foundation for:
- **Create interactive, offline-ready learning experiences for students in low-connectivity regions.**
- Advanced multimodal AI systems for educational platforms.
- Domain-specific reasoning tools for science and engineering.
- Interactive learning applications in STEM fields.
## โ ๏ธ Bias, Risks, and Limitations
This model inherits limitations common to most LLMs and has specific risks related to its application:
- **Hallucination:** The model can generate incorrect or fabricated information.
- **Prompt Sensitivity:** The phrasing of a prompt can significantly affect the output quality.
- **Inherited Biases:** It may reflect biases present in the `gemma-3n-E2B-it` base model and the `gek408` dataset.
- **Risk of "Fluent Nonsense"**: In educational contexts, the model might generate explanations that sound logical and correct but contain subtle mathematical or scientific inaccuracies. **Human verification is crucial for factual and educational use cases.**
### ๐ก Recommendations
Always critically evaluate the model's output before use in any real-world application. For educational purposes, outputs should be reviewed by a subject matter expert.
## ๐ Getting Started
The model was trained with Unsloth, so using it for inference is recommended for maximum performance.
```python
from unsloth import FastModel
import torch
from transformers import TextStreamer
import gc
# Load the model and tokenizer with 4-bit quantization
model, tokenizer = FastModel.from_pretrained(
model_name = "Argobell/gemma-3n-gek408-dpo",
max_seq_length = 1024, # Choose any for long context!
load_in_4bit = True, # 4 bit quantization to reduce memory
# token = "hf_...", # use one if using gated models
)
# Helper function for inference
def do_gemma_3n_inference(model, messages, max_new_tokens = 128):
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt = True, # Must add for generation
tokenize = True,
return_dict = True,
return_tensors = "pt",
).to("cuda")
_ = model.generate(
**inputs,
max_new_tokens = max_new_tokens,
temperature = 1.0, top_p = 0.95, top_k = 64,
streamer = TextStreamer(tokenizer, skip_prompt = True),
)
# Cleanup to reduce VRAM usage
del inputs
torch.cuda.empty_cache()
gc.collect()
sloth_link = "https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg"
messages = [{
"role" : "user",
"content": [
{ "type": "image", "image" : sloth_link },
{ "type": "text", "text" : "Which films does this animal feature in?" }
]
}]
# You might have to wait 1 minute for Unsloth's auto compiler
do_gemma_3n_inference(model, messages, max_new_tokens = 256)
```
## ๐ ๏ธ Training Details
The training was conducted in two distinct phases, using a LoRA-based approach accelerated by Unsloth.
### ๐ Phase 1: Supervised Fine-Tuning (SFT)
- **Goal:** To teach the model the fundamental structure of responding to mathematical prompts.
- **Dataset:** [`Argobell/gek408`](https://huggingface.co/datasets/Argobell/gek408)
- **Key Hyperparameters:** The following parameters were used to tune both the vision and language components of the model.
```bash
# SFT Stage Configuration
--max_seq_length 2048
--max_steps 320
--learning_rate 2e-4
--lr_scheduler_type "cosine"
--optim "adamw_torch_fused"
# LoRA Configuration
--tune_vision
--tune_language_layers
--tune_attention_modules
--tune_mlp_modules
--r 16
--alpha 16
--lora_dropout 0.05
# Batching & Memory
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--gradient_checkpointing
```
### ๐ Phase 2: Direct Preference Optimization (DPO)
- **Goal:** To refine the SFT model by training it to prefer helpful, accurate responses over less desirable ones.
- **Dataset:** [`Argobell/gek408-dpo`](https://huggingface.co/datasets/Argobell/gek408-dpo)
- **Key Hyperparameters:** Starting from the SFT-tuned model, DPO training was performed with the following settings.
```bash
# DPO Stage Configuration
--max_seq_length 2048
--max_prompt_length 1024
--max_steps 100
--learning_rate 5e-6
--optim "adamw_torch_fused"
--warmup_ration 0.1
--weight_decay 0.01
# LoRA Configuration
--tune_vision
--tune_language_layers
--tune_attention_modules
--tune_mlp_modules
--r 4
--alpha 4
--lora_dropout 0.1
# Batching & Memory
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--gradient_accumulation_steps 4
--gradient_checkpointing
```
### ๐ป Infrastructure & Software
- **Hardware:** 1ร NVIDIA RTX 5880 Ada Generation
- **Key Software:**
- **Unsloth:** Used for 2-3x faster training and ~60% less memory usage, enabling more extensive experimentation.
- **Hugging Face TRL:** For implementing the SFT and DPO training loops.
- **Hugging Face Transformers & Datasets.**
## ๐งฐ Technical Specifications
### Architecture
Gemma-3n utilizes a Matryoshka Transformer (MatFormer) architecture, which nests smaller, self-contained models within a larger one.
## ๐ Acknowledgements
This work would not have been possible without the foundational models and libraries developed by the open-source community. We would like to extend our gratitude to:
- Google: For developing and releasing the powerful gemma-3n-E2B-it base model.
- The Unsloth AI team: For creating the Unsloth library, which was instrumental in accelerating the training process and reducing computational costs.
- Hugging Face: For providing the transformers, datasets, and TRL libraries that formed the backbone of our training and experimentation pipeline.
## ๐ Citation
If you use this model in your work, please cite it as follows:
```bibtex
@misc{gemma3ngek408dpo,
author = {Argobell},
title = {gemma-3n-gek408-dpo},
howpublished = {\url{https://huggingface.co/Argobell/gemma-3n-gek408-dpo}},
year = {2025}
}
```
## ๐ฅ Model Card Authors
- Argobell
## ๐ฌ Contact
For questions, feedback, or collaboration, please reach out via email: [[email protected]](mailto:[email protected]) |