metadata

language: en
license: other
tags:
  - qwen
  - grpo
  - instruct
  - fine-tuned
  - reasoning
  - 3b
  - menda
datasets:
  - custom
model-index:
  - name: Menda-3b-750
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: hellaswag
          name: HellaSwag
        metrics:
          - name: Accuracy
            type: accuracy
            value: 75
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: arc-challenge
          name: ARC-Challenge
        metrics:
          - name: Accuracy
            type: accuracy
            value: 80
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: mmlu
          name: MMLU (High School)
        metrics:
          - name: Accuracy
            type: accuracy
            value: 52.5
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: truthfulqa
          name: TruthfulQA
        metrics:
          - name: Accuracy
            type: accuracy
            value: 55

Menda-3b-750: GRPO-Tuned Qwen2.5 Model

Menda-3b-750 is a fine-tuned version of Qwen2.5-3B-Instruct, trained with GRPO (Guided Rejection Policy Optimization) for 750 steps. This model shows improved performance on reasoning benchmarks compared to the base model.

Model Details

Base Model: Qwen2.5-3B-Instruct
Training Method: GRPO (Guided Rejection Policy Optimization)
Training Steps: 750
Context Length: 4096 tokens
Parameters: 3 billion

Benchmark Results

Menda-3b-750 has been evaluated on several standard benchmarks:

Benchmark	Task Type	Accuracy
HellaSwag	Common Sense Reasoning	75.0%
ARC-Challenge	Scientific Reasoning	80.0%
MMLU (High School)	Multi-domain Knowledge	52.5%
TruthfulQA	Factual Accuracy	55.0%

Detailed Benchmark Results

HellaSwag Results (click to expand)

{
  "results": {
    "hellaswag": {
      "acc": 0.75,
      "acc_norm": 0.75
    }
  },
  "config": {
    "model": "qwen_grpo_750",
    "num_fewshot": 0,
    "batch_size": 1
  }
}

ARC-Challenge Results (click to expand)

{
  "results": {
    "arc_challenge": {
      "acc": 0.8,
      "acc_norm": 0.8
    }
  },
  "config": {
    "model": "qwen_grpo_750",
    "num_fewshot": 0,
    "batch_size": 1
  }
}

MMLU (High School) Results (click to expand)

{
  "results": {
    "mmlu_high_school": {
      "acc": 0.525,
      "subjects": {
        "high_school_mathematics": 0.4,
        "high_school_physics": 0.7,
        "high_school_biology": 0.6,
        "high_school_chemistry": 0.4
      }
    }
  },
  "config": {
    "model": "qwen_grpo_750",
    "num_fewshot": 0,
    "batch_size": 1
  }
}

TruthfulQA Results (click to expand)

{
  "results": {
    "truthfulqa_mc": {
      "acc": 0.55,
      "mc1": 0.55,
      "mc2": 0.55
    }
  },
  "config": {
    "model": "qwen_grpo_750",
    "num_fewshot": 0,
    "batch_size": 1
  }
}

Usage Examples

Basic Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "weathermanj/Menda-3b-750"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Explain the concept of machine learning in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=300)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using with Ollama

You can also use this model with Ollama by converting it to GGUF format:

# Convert to GGUF
python -m llama_cpp.convert_hf_to_gguf weathermanj/Menda-3b-750 --outfile menda-3b-750.gguf

# Create Ollama model
cat > Modelfile << EOF
FROM menda-3b-750.gguf
TEMPLATE """{{ .Prompt }}"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
EOF

ollama create menda-3b-750 -f Modelfile
ollama run menda-3b-750

License

This model inherits the license of the base Qwen2.5-3B-Instruct model. Please refer to the Qwen2 license for details.