--- language: en license: other tags: - qwen - grpo - instruct - fine-tuned - reasoning - 3b - menda datasets: - custom model-index: - name: Menda-3b-750 results: - task: type: text-generation name: Text Generation dataset: type: hellaswag name: HellaSwag metrics: - name: Accuracy type: accuracy value: 75.0 - task: type: text-generation name: Text Generation dataset: type: arc-challenge name: ARC-Challenge metrics: - name: Accuracy type: accuracy value: 80.0 - task: type: text-generation name: Text Generation dataset: type: mmlu name: MMLU (High School) metrics: - name: Accuracy type: accuracy value: 52.5 - task: type: text-generation name: Text Generation dataset: type: truthfulqa name: TruthfulQA metrics: - name: Accuracy type: accuracy value: 55.0 --- # Menda-3b-750: GRPO-Tuned Qwen2.5 Model Menda-3b-750 is a fine-tuned version of Qwen2.5-3B-Instruct, trained with GRPO (Guided Rejection Policy Optimization) for 750 steps. This model shows improved performance on reasoning benchmarks compared to the base model. ## Model Details - **Base Model**: Qwen2.5-3B-Instruct - **Training Method**: GRPO (Guided Rejection Policy Optimization) - **Training Steps**: 750 - **Context Length**: 4096 tokens - **Parameters**: 3 billion ## Benchmark Results Menda-3b-750 has been evaluated on several standard benchmarks: | Benchmark | Task Type | Accuracy | |-----------|-----------|----------| | HellaSwag | Common Sense Reasoning | 75.0% | | ARC-Challenge | Scientific Reasoning | 80.0% | | MMLU (High School) | Multi-domain Knowledge | 52.5% | | TruthfulQA | Factual Accuracy | 55.0% | ## Detailed Benchmark Results

HellaSwag Results (click to expand)

```json { "results": { "hellaswag": { "acc": 0.75, "acc_norm": 0.75 } }, "config": { "model": "qwen_grpo_750", "num_fewshot": 0, "batch_size": 1 } } ```

ARC-Challenge Results (click to expand)

```json { "results": { "arc_challenge": { "acc": 0.8, "acc_norm": 0.8 } }, "config": { "model": "qwen_grpo_750", "num_fewshot": 0, "batch_size": 1 } } ```

MMLU (High School) Results (click to expand)

```json { "results": { "mmlu_high_school": { "acc": 0.525, "subjects": { "high_school_mathematics": 0.4, "high_school_physics": 0.7, "high_school_biology": 0.6, "high_school_chemistry": 0.4 } } }, "config": { "model": "qwen_grpo_750", "num_fewshot": 0, "batch_size": 1 } } ```

TruthfulQA Results (click to expand)

```json { "results": { "truthfulqa_mc": { "acc": 0.55, "mc1": 0.55, "mc2": 0.55 } }, "config": { "model": "qwen_grpo_750", "num_fewshot": 0, "batch_size": 1 } } ```

## Usage Examples ### Basic Usage with Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "weathermanj/Menda-3b-750" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) prompt = "Explain the concept of machine learning in simple terms." inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=300) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### Using with Ollama You can also use this model with Ollama by converting it to GGUF format: ```bash # Convert to GGUF python -m llama_cpp.convert_hf_to_gguf weathermanj/Menda-3b-750 --outfile menda-3b-750.gguf # Create Ollama model cat > Modelfile << EOF FROM menda-3b-750.gguf TEMPLATE """{{ .Prompt }}""" PARAMETER temperature 0.7 PARAMETER top_p 0.9 PARAMETER top_k 40 EOF ollama create menda-3b-750 -f Modelfile ollama run menda-3b-750 ``` ## License This model inherits the license of the base Qwen2.5-3B-Instruct model. Please refer to the [Qwen2 license](https://huggingface.co/Qwen/Qwen2-3B-Instruct/blob/main/LICENSE) for details.