MNLP M3 Merged Model (SFT + DPO)

This model combines the best of both worlds:

SFT Component: mgatti/MNLP_M3_mcqa_model - Multiple-choice QA capabilities
DPO Component: albertfares/MNLP_M3_dpo_model - Preference-aligned responses

Model Details

Base Model: Qwen/Qwen3-0.6B-Base
SFT Model: Multiple-choice QA fine-tuned model
DPO Model: Direct preference optimized model
Merge Strategy: Advanced model weight merging
Combined Capabilities: MCQA + preference alignment

Capabilities

✅ Multiple-Choice Question Answering (from SFT component) ✅ Preference-Aligned Generation (from DPO component)
✅ Math and Code Generation (from MNLP M3 training) ✅ Reasoning Tasks (combined strengths)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("merged_mnlp_m3_sft_dpo")
tokenizer = AutoTokenizer.from_pretrained("merged_mnlp_m3_sft_dpo")

# For MCQA
prompt = "Which of the following is correct? A) 2+2=5 B) 2+2=4 C) 2+2=3"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# For general generation
prompt = "Explain the concept of recursion in programming"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=300, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Data

SFT: Multiple-choice QA dataset
DPO: MNLP M3 preference dataset with math, code, and reasoning

This merged model should excel at both structured QA tasks and open-ended generation with preference alignment.

albertfares
/

DPO_MCQA_model_3_05_05_08

MNLP M3 Merged Model (SFT + DPO)

Model Details

Capabilities

Usage

Training Data

Model tree for albertfares/DPO_MCQA_model_3_05_05_08

Dataset used to train albertfares/DPO_MCQA_model_3_05_05_08