MNLP M3 fDPO Model (69k samples)

This model is a fine-tuned version of Qwen/Qwen3-0.6B-Base using filtered Direct Preference Optimization (fDPO) on the MNLP M3 DPO dataset.

Model Details

  • Base Model: Qwen/Qwen3-0.6B-Base
  • Training Method: fDPO (filtered Direct Preference Optimization)
  • Dataset: MNLP M3 mixed dataset (~69k samples)
  • Format: SafeTensors (secure format)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("albertfares/MNLP_M3_dpo_model_69k")
tokenizer = AutoTokenizer.from_pretrained("albertfares/MNLP_M3_dpo_model_69k")

This model uses SafeTensors format for enhanced security and faster loading.

Downloads last month
1,180
Safetensors
Model size
752M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for albertfares/MNLP_SFT_DPO

Finetuned
(282)
this model

Dataset used to train albertfares/MNLP_SFT_DPO