---
base_model: Gunulhona/Gemma-System-9B
library_name: peft
---

# Gemma-System-9B with MoRA + SimPO

<!-- Provide a quick summary of what the model is/does. -->
This is a SimPO finetuned version of Gemma-System-9B using MoRA (Mixture of Rank Adaptation) for preference alignment. The model is trained to better align with human preferences through direct preference optimization.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->
This model is a finetuned version of Gemma-System-9B using SimPO (Simple Preference Optimization) training method. The model uses MoRA adaptation with rank 256 to efficiently finetune the base model while maintaining its core capabilities.

- **Developed by:** [Original: Merged Gemma-2-9B-it, Finetuned: Gunulhona]
- **Model type:** Causal Language Model with MoRA adaptation
- **Language(s):** Primarily English and Korean
- **License:** Same as base model (Gemma-System-9B)
- **Finetuned from model:** Gunulhona/Gemma-System-9B

## Training Details

### Training Procedure

#### Training Hyperparameters

- **Training regime:** bfloat16 mixed precision
- **Learning rate:** 5e-7
- **Batch size per device:** 1
- **Gradient accumulation steps:** 16
- **Total batch size:** 16
- **Number of epochs:** 200
- **Optimizer:** AdamW with cosine restarts scheduler
- **Loss type:** SimPO (configurable)
- **Beta (SimPO):** 10.0
- **SimPO gamma:** 0.5
- **Maximum sequence length:** 65,536 tokens

#### MoRA Configuration
- **Rank (r):** 256
- **Alpha:** 16
- **Dropout:** 0.05
- **MoRA type:** 6
- **Target modules:**
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - down_proj
  - up_proj

### Training Data

The model was trained on the "Gunulhona/open_dpo_merged" dataset, which contains pairs of preferred and non-preferred responses for preference learning.

## Technical Specifications

### Model Architecture and Objective

The model uses MoRA (Mixture of Rank Adaptation) for efficient parameter-efficient finetuning. It can be trained using either DPO or SimPO objectives:

- **SimPO:** Simple Preference Optimization with β=10.0 and γ=0.5

### Compute Infrastructure

#### Hardware
- Training performed on CUDA-capable GPUs
- Uses DeepSpeed for distributed training
- Gradient checkpointing enabled for memory efficiency

#### Software
- PEFT library for parameter-efficient finetuning
- Transformers library
- DeepSpeed for training optimization
- Weights & Biases for experiment tracking

## Environmental Impact

- **Hardware Type:** NVIDIA GPUs
- **Training Regime:** Mixed BF16 precision
- **Optimization:** DeepSpeed + Gradient Checkpointing

## Model Card Contact

For questions about this model, please contact Gunulhona.

### Framework versions

- [PEFT 0.9.0](https://github.com/kongds/MoRA)