--- base_model: Gunulhona/Gemma-System-9B library_name: peft --- # Gemma-System-9B with MoRA + SimPO This is a SimPO finetuned version of Gemma-System-9B using MoRA (Mixture of Rank Adaptation) for preference alignment. The model is trained to better align with human preferences through direct preference optimization. ## Model Details ### Model Description This model is a finetuned version of Gemma-System-9B using SimPO (Simple Preference Optimization) training method. The model uses MoRA adaptation with rank 256 to efficiently finetune the base model while maintaining its core capabilities. - **Developed by:** [Original: Merged Gemma-2-9B-it, Finetuned: Gunulhona] - **Model type:** Causal Language Model with MoRA adaptation - **Language(s):** Primarily English and Korean - **License:** Same as base model (Gemma-System-9B) - **Finetuned from model:** Gunulhona/Gemma-System-9B ## Training Details ### Training Procedure #### Training Hyperparameters - **Training regime:** bfloat16 mixed precision - **Learning rate:** 5e-7 - **Batch size per device:** 1 - **Gradient accumulation steps:** 16 - **Total batch size:** 16 - **Number of epochs:** 200 - **Optimizer:** AdamW with cosine restarts scheduler - **Loss type:** SimPO (configurable) - **Beta (SimPO):** 10.0 - **SimPO gamma:** 0.5 - **Maximum sequence length:** 65,536 tokens #### MoRA Configuration - **Rank (r):** 256 - **Alpha:** 16 - **Dropout:** 0.05 - **MoRA type:** 6 - **Target modules:** - q_proj - k_proj - v_proj - o_proj - gate_proj - down_proj - up_proj ### Training Data The model was trained on the "Gunulhona/open_dpo_merged" dataset, which contains pairs of preferred and non-preferred responses for preference learning. ## Technical Specifications ### Model Architecture and Objective The model uses MoRA (Mixture of Rank Adaptation) for efficient parameter-efficient finetuning. It can be trained using either DPO or SimPO objectives: - **SimPO:** Simple Preference Optimization with β=10.0 and γ=0.5 ### Compute Infrastructure #### Hardware - Training performed on CUDA-capable GPUs - Uses DeepSpeed for distributed training - Gradient checkpointing enabled for memory efficiency #### Software - PEFT library for parameter-efficient finetuning - Transformers library - DeepSpeed for training optimization - Weights & Biases for experiment tracking ## Environmental Impact - **Hardware Type:** NVIDIA GPUs - **Training Regime:** Mixed BF16 precision - **Optimization:** DeepSpeed + Gradient Checkpointing ## Model Card Contact For questions about this model, please contact Gunulhona. ### Framework versions - [PEFT 0.9.0](https://github.com/kongds/MoRA)