MNLP M3 Merged Model (SFT + DPO)
This model combines the best of both worlds:
- SFT Component:
mgatti/MNLP_M3_mcqa_model
- Multiple-choice QA capabilities - DPO Component:
albertfares/MNLP_M3_dpo_model
- Preference-aligned responses
Model Details
- Base Model: Qwen/Qwen3-0.6B-Base
- SFT Model: Multiple-choice QA fine-tuned model
- DPO Model: Direct preference optimized model
- Merge Strategy: Advanced model weight merging
- Combined Capabilities: MCQA + preference alignment
Capabilities
โ
Multiple-Choice Question Answering (from SFT component)
โ
Preference-Aligned Generation (from DPO component)
โ
Math and Code Generation (from MNLP M3 training)
โ
Reasoning Tasks (combined strengths)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("merged_mnlp_m3_sft_dpo")
tokenizer = AutoTokenizer.from_pretrained("merged_mnlp_m3_sft_dpo")
# For MCQA
prompt = "Which of the following is correct? A) 2+2=5 B) 2+2=4 C) 2+2=3"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# For general generation
prompt = "Explain the concept of recursion in programming"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=300, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Data
- SFT: Multiple-choice QA dataset
- DPO: MNLP M3 preference dataset with math, code, and reasoning
This merged model should excel at both structured QA tasks and open-ended generation with preference alignment.
- Downloads last month
- 1,607
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for albertfares/DPO_MCQA_model_3_03_07_08
Base model
Qwen/Qwen3-0.6B-Base