--- license: apache-2.0 base_model: Qwen/Qwen3-0.6B-Base tags: - dpo - fdpo - math - code - qwen3 - reasoning datasets: - albertfares/MNLP_M3_dpo_dataset language: - en pipeline_tag: text-generation --- # MNLP M3 fDPO Model (69k samples) This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) using **filtered Direct Preference Optimization (fDPO)** on the [MNLP M3 DPO dataset](https://huggingface.co/datasets/albertfares/MNLP_M3_dpo_dataset). ## Model Details - **Base Model**: Qwen/Qwen3-0.6B-Base - **Training Method**: fDPO (filtered Direct Preference Optimization) - **Dataset**: MNLP M3 mixed dataset (~69k samples) - **Format**: SafeTensors (secure format) ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("albertfares/MNLP_M3_dpo_model_69k") tokenizer = AutoTokenizer.from_pretrained("albertfares/MNLP_M3_dpo_model_69k") ``` This model uses SafeTensors format for enhanced security and faster loading.