MNLP M3 Merged Model (SFT + DPO)

This model combines the best of both worlds:

  • SFT Component: mgatti/MNLP_M3_mcqa_model - Multiple-choice QA capabilities
  • DPO Component: albertfares/MNLP_M3_dpo_model - Preference-aligned responses

Model Details

  • Base Model: Qwen/Qwen3-0.6B-Base
  • SFT Model: Multiple-choice QA fine-tuned model
  • DPO Model: Direct preference optimized model
  • Merge Strategy: Advanced model weight merging
  • Combined Capabilities: MCQA + preference alignment

Capabilities

โœ… Multiple-Choice Question Answering (from SFT component) โœ… Preference-Aligned Generation (from DPO component)
โœ… Math and Code Generation (from MNLP M3 training) โœ… Reasoning Tasks (combined strengths)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("merged_mnlp_m3_sft_dpo")
tokenizer = AutoTokenizer.from_pretrained("merged_mnlp_m3_sft_dpo")

# For MCQA
prompt = "Which of the following is correct? A) 2+2=5 B) 2+2=4 C) 2+2=3"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# For general generation
prompt = "Explain the concept of recursion in programming"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=300, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Data

  • SFT: Multiple-choice QA dataset
  • DPO: MNLP M3 preference dataset with math, code, and reasoning

This merged model should excel at both structured QA tasks and open-ended generation with preference alignment.

Downloads last month
102
Safetensors
Model size
752M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for albertfares/DPO_MCQA_model_3_05_05_08

Finetuned
(282)
this model

Dataset used to train albertfares/DPO_MCQA_model_3_05_05_08