---
language: en
license: other
tags:
- qwen
- grpo
- instruct
- fine-tuned
- reasoning
- 3b
- menda
datasets:
- custom
model-index:
- name: Menda-3b-750
results:
- task:
type: text-generation
name: Text Generation
dataset:
type: hellaswag
name: HellaSwag
metrics:
- name: Accuracy
type: accuracy
value: 75.0
- task:
type: text-generation
name: Text Generation
dataset:
type: arc-challenge
name: ARC-Challenge
metrics:
- name: Accuracy
type: accuracy
value: 80.0
- task:
type: text-generation
name: Text Generation
dataset:
type: mmlu
name: MMLU (High School)
metrics:
- name: Accuracy
type: accuracy
value: 52.5
- task:
type: text-generation
name: Text Generation
dataset:
type: truthfulqa
name: TruthfulQA
metrics:
- name: Accuracy
type: accuracy
value: 55.0
---
# Menda-3b-750: GRPO-Tuned Qwen2.5 Model
Menda-3b-750 is a fine-tuned version of Qwen2.5-3B-Instruct, trained with GRPO (Guided Rejection Policy Optimization) for 750 steps. This model shows improved performance on reasoning benchmarks compared to the base model.
## Model Details
- **Base Model**: Qwen2.5-3B-Instruct
- **Training Method**: GRPO (Guided Rejection Policy Optimization)
- **Training Steps**: 750
- **Context Length**: 4096 tokens
- **Parameters**: 3 billion
## Benchmark Results
Menda-3b-750 has been evaluated on several standard benchmarks:
| Benchmark | Task Type | Accuracy |
|-----------|-----------|----------|
| HellaSwag | Common Sense Reasoning | 75.0% |
| ARC-Challenge | Scientific Reasoning | 80.0% |
| MMLU (High School) | Multi-domain Knowledge | 52.5% |
| TruthfulQA | Factual Accuracy | 55.0% |
## Detailed Benchmark Results
HellaSwag Results (click to expand)
```json
{
"results": {
"hellaswag": {
"acc": 0.75,
"acc_norm": 0.75
}
},
"config": {
"model": "qwen_grpo_750",
"num_fewshot": 0,
"batch_size": 1
}
}
```
ARC-Challenge Results (click to expand)
```json
{
"results": {
"arc_challenge": {
"acc": 0.8,
"acc_norm": 0.8
}
},
"config": {
"model": "qwen_grpo_750",
"num_fewshot": 0,
"batch_size": 1
}
}
```
MMLU (High School) Results (click to expand)
```json
{
"results": {
"mmlu_high_school": {
"acc": 0.525,
"subjects": {
"high_school_mathematics": 0.4,
"high_school_physics": 0.7,
"high_school_biology": 0.6,
"high_school_chemistry": 0.4
}
}
},
"config": {
"model": "qwen_grpo_750",
"num_fewshot": 0,
"batch_size": 1
}
}
```
TruthfulQA Results (click to expand)
```json
{
"results": {
"truthfulqa_mc": {
"acc": 0.55,
"mc1": 0.55,
"mc2": 0.55
}
},
"config": {
"model": "qwen_grpo_750",
"num_fewshot": 0,
"batch_size": 1
}
}
```
## Usage Examples
### Basic Usage with Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "weathermanj/Menda-3b-750"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "Explain the concept of machine learning in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=300)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### Using with Ollama
You can also use this model with Ollama by converting it to GGUF format:
```bash
# Convert to GGUF
python -m llama_cpp.convert_hf_to_gguf weathermanj/Menda-3b-750 --outfile menda-3b-750.gguf
# Create Ollama model
cat > Modelfile << EOF
FROM menda-3b-750.gguf
TEMPLATE """{{ .Prompt }}"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
EOF
ollama create menda-3b-750 -f Modelfile
ollama run menda-3b-750
```
## License
This model inherits the license of the base Qwen2.5-3B-Instruct model. Please refer to the [Qwen2 license](https://huggingface.co/Qwen/Qwen2-3B-Instruct/blob/main/LICENSE) for details.