MNLP M3 fDPO Model (69k samples)
This model is a fine-tuned version of Qwen/Qwen3-0.6B-Base using filtered Direct Preference Optimization (fDPO) on the MNLP M3 DPO dataset.
Model Details
- Base Model: Qwen/Qwen3-0.6B-Base
- Training Method: fDPO (filtered Direct Preference Optimization)
- Dataset: MNLP M3 mixed dataset (~69k samples)
- Format: SafeTensors (secure format)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("albertfares/MNLP_M3_dpo_model_69k")
tokenizer = AutoTokenizer.from_pretrained("albertfares/MNLP_M3_dpo_model_69k")
This model uses SafeTensors format for enhanced security and faster loading.
- Downloads last month
- 1,180
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for albertfares/MNLP_SFT_DPO
Base model
Qwen/Qwen3-0.6B-Base