metadata
language: en
license: other
tags:
- qwen
- grpo
- instruct
- fine-tuned
- reasoning
- 3b
- menda
datasets:
- custom
model-index:
- name: Menda-3b-750
results:
- task:
type: text-generation
name: Text Generation
dataset:
type: hellaswag
name: HellaSwag
metrics:
- name: Accuracy
type: accuracy
value: 75
- task:
type: text-generation
name: Text Generation
dataset:
type: arc-challenge
name: ARC-Challenge
metrics:
- name: Accuracy
type: accuracy
value: 80
- task:
type: text-generation
name: Text Generation
dataset:
type: mmlu
name: MMLU (High School)
metrics:
- name: Accuracy
type: accuracy
value: 52.5
- task:
type: text-generation
name: Text Generation
dataset:
type: truthfulqa
name: TruthfulQA
metrics:
- name: Accuracy
type: accuracy
value: 55
Menda-3b-750: GRPO-Tuned Qwen2.5 Model
Menda-3b-750 is a fine-tuned version of Qwen2.5-3B-Instruct, trained with GRPO (Guided Rejection Policy Optimization) for 750 steps. This model shows improved performance on reasoning benchmarks compared to the base model.
Model Details
- Base Model: Qwen2.5-3B-Instruct
- Training Method: GRPO (Guided Rejection Policy Optimization)
- Training Steps: 750
- Context Length: 4096 tokens
- Parameters: 3 billion
Benchmark Results
Menda-3b-750 has been evaluated on several standard benchmarks:
Benchmark | Task Type | Accuracy |
---|---|---|
HellaSwag | Common Sense Reasoning | 75.0% |
ARC-Challenge | Scientific Reasoning | 80.0% |
MMLU (High School) | Multi-domain Knowledge | 52.5% |
TruthfulQA | Factual Accuracy | 55.0% |
Detailed Benchmark Results
HellaSwag Results (click to expand)
{
"results": {
"hellaswag": {
"acc": 0.75,
"acc_norm": 0.75
}
},
"config": {
"model": "qwen_grpo_750",
"num_fewshot": 0,
"batch_size": 1
}
}
ARC-Challenge Results (click to expand)
{
"results": {
"arc_challenge": {
"acc": 0.8,
"acc_norm": 0.8
}
},
"config": {
"model": "qwen_grpo_750",
"num_fewshot": 0,
"batch_size": 1
}
}
MMLU (High School) Results (click to expand)
{
"results": {
"mmlu_high_school": {
"acc": 0.525,
"subjects": {
"high_school_mathematics": 0.4,
"high_school_physics": 0.7,
"high_school_biology": 0.6,
"high_school_chemistry": 0.4
}
}
},
"config": {
"model": "qwen_grpo_750",
"num_fewshot": 0,
"batch_size": 1
}
}
TruthfulQA Results (click to expand)
{
"results": {
"truthfulqa_mc": {
"acc": 0.55,
"mc1": 0.55,
"mc2": 0.55
}
},
"config": {
"model": "qwen_grpo_750",
"num_fewshot": 0,
"batch_size": 1
}
}
Usage Examples
Basic Usage with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "weathermanj/Menda-3b-750"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "Explain the concept of machine learning in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=300)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Using with Ollama
You can also use this model with Ollama by converting it to GGUF format:
# Convert to GGUF
python -m llama_cpp.convert_hf_to_gguf weathermanj/Menda-3b-750 --outfile menda-3b-750.gguf
# Create Ollama model
cat > Modelfile << EOF
FROM menda-3b-750.gguf
TEMPLATE """{{ .Prompt }}"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
EOF
ollama create menda-3b-750 -f Modelfile
ollama run menda-3b-750
License
This model inherits the license of the base Qwen2.5-3B-Instruct model. Please refer to the Qwen2 license for details.