weathermanj commited on
Commit
aff308a
·
verified ·
1 Parent(s): e5b61b5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +119 -53
README.md CHANGED
@@ -1,98 +1,123 @@
1
  ---
2
- language:
3
- - en
4
  license: other
5
- base_model: Qwen/Qwen2.5-3B-Instruct
6
  tags:
7
  - qwen
8
  - grpo
9
- - reinforcement-learning
10
- - instruction-tuning
11
- - mathematical-reasoning
12
- - gsm8k
 
 
 
 
13
  datasets:
14
  - gsm8k
15
  model-index:
16
  - name: Menda-3B-500
17
  results:
18
  - task:
19
- type: multiple-choice-qa
 
 
 
20
  name: ARC-Challenge
21
  metrics:
22
  - name: Accuracy
23
  type: accuracy
24
  value: 50.0
25
  - task:
26
- type: multiple-choice-qa
 
 
 
27
  name: BoolQ
28
  metrics:
29
  - name: Accuracy
30
  type: accuracy
31
  value: 90.0
32
  - task:
33
- type: multiple-choice-qa
 
 
 
34
  name: HellaSwag
35
  metrics:
36
  - name: Accuracy
37
  type: accuracy
38
  value: 40.0
39
  - task:
40
- type: multiple-choice-qa
41
- name: Lambada
42
- metrics:
43
- - name: Accuracy
44
- type: accuracy
45
- value: 70.0
46
- - task:
47
- type: multiple-choice-qa
48
- name: PIQA
49
- metrics:
50
- - name: Accuracy
51
- type: accuracy
52
- value: 90.0
53
- - task:
54
- type: multiple-choice-qa
55
- name: Winogrande
56
- metrics:
57
- - name: Accuracy
58
- type: accuracy
59
- value: 90.0
60
- - task:
61
  type: mmlu
62
- name: MMLU
63
  metrics:
64
- - name: Average
65
  type: accuracy
66
  value: 68.60
67
  ---
68
 
69
- # Menda-3B-500
70
 
71
- Menda-3B-500 is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) using Guided Reinforcement from Preference Optimization (GRPO). This model represents the 500-step checkpoint from the training process.
72
 
73
  ## Model Details
74
 
75
  - **Base Model**: Qwen/Qwen2.5-3B-Instruct
76
  - **Training Method**: GRPO (Guided Reinforcement from Preference Optimization)
77
  - **Training Steps**: 500
78
- - **Parameters**: 3B
79
  - **Context Length**: 32K tokens
80
  - **Training Data**: GSM8K (mathematical reasoning)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
- ## Performance
 
 
 
83
 
84
- Based on extensive evaluation, the 500-step checkpoint shows strong and balanced performance across multiple benchmarks:
 
 
 
 
 
85
 
86
- ### Core Benchmarks (0-shot)
87
 
88
- | Benchmark | Score |
89
- |-----------|-------|
90
- | ARC-Challenge | 50.0% |
91
- | BoolQ | 90.0% |
92
- | HellaSwag | 40.0% |
93
- | Lambada | 70.0% |
94
- | PIQA | 90.0% |
95
- | Winogrande | 90.0% |
 
 
96
 
97
  ### MMLU Performance
98
 
@@ -112,24 +137,44 @@ Based on extensive evaluation, the 500-step checkpoint shows strong and balanced
112
  - **Efficient Training**: Achieves impressive results with relatively minimal training (500 steps).
113
  - **Stable Knowledge**: Maintains strong MMLU performance (68.60%) across diverse knowledge domains.
114
 
115
- ## Usage
 
 
116
 
117
  ```python
118
  from transformers import AutoModelForCausalLM, AutoTokenizer
119
 
120
  model_name = "weathermanj/Menda-3B-500"
121
-
122
  model = AutoModelForCausalLM.from_pretrained(
123
  model_name,
124
  torch_dtype="auto",
125
  device_map="auto"
126
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
127
  tokenizer = AutoTokenizer.from_pretrained(model_name)
 
 
 
 
 
128
 
129
- prompt = "Give me a short introduction to large language models."
130
  messages = [
131
- {"role": "system", "content": "You are a helpful assistant."},
132
- {"role": "user", "content": prompt}
133
  ]
134
  text = tokenizer.apply_chat_template(
135
  messages,
@@ -150,6 +195,27 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
150
  print(response)
151
  ```
152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
  ## Training Configuration
154
 
155
  The model was trained using the GRPO methodology with the following configuration:
@@ -163,4 +229,4 @@ The model was trained using the GRPO methodology with the following configuratio
163
 
164
  ## License
165
 
166
- This model is subject to the license of the original Qwen2.5-3B-Instruct model.
 
1
  ---
2
+ language: en
 
3
  license: other
 
4
  tags:
5
  - qwen
6
  - grpo
7
+ - instruct
8
+ - fine-tuned
9
+ - reasoning
10
+ - 3b
11
+ - menda
12
+ - chat
13
+ - transformers
14
+ library_name: transformers
15
  datasets:
16
  - gsm8k
17
  model-index:
18
  - name: Menda-3B-500
19
  results:
20
  - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ type: arc-challenge
25
  name: ARC-Challenge
26
  metrics:
27
  - name: Accuracy
28
  type: accuracy
29
  value: 50.0
30
  - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ type: boolq
35
  name: BoolQ
36
  metrics:
37
  - name: Accuracy
38
  type: accuracy
39
  value: 90.0
40
  - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ type: hellaswag
45
  name: HellaSwag
46
  metrics:
47
  - name: Accuracy
48
  type: accuracy
49
  value: 40.0
50
  - task:
51
+ type: text-generation
52
+ name: Text Generation
53
+ dataset:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  type: mmlu
55
+ name: MMLU (Overall)
56
  metrics:
57
+ - name: Accuracy
58
  type: accuracy
59
  value: 68.60
60
  ---
61
 
62
+ # Menda-3B-500: GRPO-Tuned Qwen2.5 Model
63
 
64
+ Menda-3B-500 is a fine-tuned version of Qwen2.5-3B-Instruct, trained with GRPO (Guided Reinforcement from Preference Optimization) for 500 steps. This model shows improved performance on reasoning benchmarks compared to the base model.
65
 
66
  ## Model Details
67
 
68
  - **Base Model**: Qwen/Qwen2.5-3B-Instruct
69
  - **Training Method**: GRPO (Guided Reinforcement from Preference Optimization)
70
  - **Training Steps**: 500
71
+ - **Parameters**: 3 billion
72
  - **Context Length**: 32K tokens
73
  - **Training Data**: GSM8K (mathematical reasoning)
74
+ - **Chat Template**: Uses the Qwen2 chat template
75
+
76
+ ## Chat Format
77
+
78
+ This model uses the standard Qwen2 chat template. For best results when using the model directly, format your prompts as follows:
79
+
80
+ ```
81
+ <|im_start|>system
82
+ You are a helpful AI assistant.<|im_end|>
83
+ <|im_start|>user
84
+ Your question here<|im_end|>
85
+ <|im_start|>assistant
86
+ ```
87
+
88
+ When using the model through the Hugging Face Transformers library, the chat template will be applied automatically when using the `chat_template` functionality:
89
+
90
+ ```python
91
+ from transformers import AutoModelForCausalLM, AutoTokenizer
92
+
93
+ model_name = "weathermanj/Menda-3B-500"
94
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
95
+ model = AutoModelForCausalLM.from_pretrained(model_name)
96
 
97
+ messages = [
98
+ {"role": "system", "content": "You are a helpful AI assistant."},
99
+ {"role": "user", "content": "Explain the concept of machine learning in simple terms."}
100
+ ]
101
 
102
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False)
103
+ inputs = tokenizer(prompt, return_tensors="pt")
104
+ outputs = model.generate(**inputs, max_length=300)
105
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
106
+ print(response)
107
+ ```
108
 
109
+ ## Benchmark Results
110
 
111
+ Menda-3B-500 has been evaluated on several standard benchmarks:
112
+
113
+ | Benchmark | Task Type | Accuracy |
114
+ |-----------|-----------|----------|
115
+ | ARC-Challenge | Scientific Reasoning | 50.0% |
116
+ | BoolQ | Reading Comprehension | 90.0% |
117
+ | HellaSwag | Common Sense Reasoning | 40.0% |
118
+ | Lambada | Text Completion | 70.0% |
119
+ | PIQA | Physical Reasoning | 90.0% |
120
+ | Winogrande | Commonsense Reasoning | 90.0% |
121
 
122
  ### MMLU Performance
123
 
 
137
  - **Efficient Training**: Achieves impressive results with relatively minimal training (500 steps).
138
  - **Stable Knowledge**: Maintains strong MMLU performance (68.60%) across diverse knowledge domains.
139
 
140
+ ## Usage Examples
141
+
142
+ ### Basic Usage with Transformers
143
 
144
  ```python
145
  from transformers import AutoModelForCausalLM, AutoTokenizer
146
 
147
  model_name = "weathermanj/Menda-3B-500"
148
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
149
  model = AutoModelForCausalLM.from_pretrained(
150
  model_name,
151
  torch_dtype="auto",
152
  device_map="auto"
153
  )
154
+
155
+ prompt = "Explain the concept of machine learning in simple terms."
156
+ inputs = tokenizer(prompt, return_tensors="pt")
157
+ outputs = model.generate(**inputs, max_length=300)
158
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
159
+ print(response)
160
+ ```
161
+
162
+ ### Chat Usage with Transformers
163
+
164
+ ```python
165
+ from transformers import AutoModelForCausalLM, AutoTokenizer
166
+
167
+ model_name = "weathermanj/Menda-3B-500"
168
  tokenizer = AutoTokenizer.from_pretrained(model_name)
169
+ model = AutoModelForCausalLM.from_pretrained(
170
+ model_name,
171
+ torch_dtype="auto",
172
+ device_map="auto"
173
+ )
174
 
 
175
  messages = [
176
+ {"role": "system", "content": "You are a helpful AI assistant."},
177
+ {"role": "user", "content": "Give me a short introduction to large language models."}
178
  ]
179
  text = tokenizer.apply_chat_template(
180
  messages,
 
195
  print(response)
196
  ```
197
 
198
+ ### Using with Ollama
199
+
200
+ You can also use this model with Ollama by converting it to GGUF format:
201
+
202
+ ```bash
203
+ # Convert to GGUF
204
+ python -m llama_cpp.convert_hf_to_gguf weathermanj/Menda-3B-500 --outfile menda-3b-500.gguf
205
+
206
+ # Create Ollama model
207
+ cat > Modelfile << EOF
208
+ FROM menda-3b-500.gguf
209
+ TEMPLATE """{{ .Prompt }}"""
210
+ PARAMETER temperature 0.7
211
+ PARAMETER top_p 0.9
212
+ PARAMETER top_k 40
213
+ EOF
214
+
215
+ ollama create menda-3b-500 -f Modelfile
216
+ ollama run menda-3b-500
217
+ ```
218
+
219
  ## Training Configuration
220
 
221
  The model was trained using the GRPO methodology with the following configuration:
 
229
 
230
  ## License
231
 
232
+ This model inherits the license of the base Qwen2.5-3B-Instruct model. Please refer to the [Qwen2 license](https://huggingface.co/Qwen/Qwen2-3B-Instruct/blob/main/LICENSE) for details.