Files changed (1) hide show
  1. README.md +70 -70
README.md CHANGED
@@ -1,70 +1,70 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-Math-7B
4
- - Qwen/Qwen2.5-7B-Instruct
5
- - Qwen/Qwen2.5-7B
6
- library_name: transformers
7
- tags:
8
- - mergekit
9
- - merge
10
- language:
11
- - zho
12
- - eng
13
- - fra
14
- - spa
15
- - por
16
- - deu
17
- - ita
18
- - rus
19
- - jpn
20
- - kor
21
- - vie
22
- - tha
23
- - ara
24
- ---
25
- # Qwen2.5-7B-Instruct-Math-task-arithmetic
26
-
27
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
28
-
29
- ## Performance
30
- | Metric |Value|
31
- |---------------------------------|----:|
32
- |GSM8k (zero-shot) |91.35|
33
- |HellaSwag (zero-Shot) |80.01|
34
- |MBPP (zero-shot) |61.01|
35
-
36
- ## Merge Details
37
- ### Merge Method
38
-
39
- This model was merged using the [Task Arithmetic](https://arxiv.org/abs/2212.04089) merge method using [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) as a base.
40
-
41
- ### Models Merged
42
-
43
- The following models were included in the merge:
44
- * [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
45
- * [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
46
-
47
- ### Configuration
48
-
49
- The following YAML configuration was used to produce this model:
50
-
51
- ```yaml
52
- base_model: Qwen/Qwen2.5-7B
53
- dtype: bfloat16
54
- merge_method: task_arithmetic
55
- parameters:
56
- lambda: 0.7870041304118442
57
- normalize: 1.0
58
- slices:
59
- - sources:
60
- - layer_range: [0, 28]
61
- model: Qwen/Qwen2.5-7B
62
- - layer_range: [0, 28]
63
- model: Qwen/Qwen2.5-Math-7B
64
- parameters:
65
- weight: 0.11841208483160265
66
- - layer_range: [0, 28]
67
- model: Qwen/Qwen2.5-7B-Instruct
68
- parameters:
69
- weight: 0.7783861791140264
70
- ```
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-Math-7B
4
+ - Qwen/Qwen2.5-7B-Instruct
5
+ - Qwen/Qwen2.5-7B
6
+ library_name: transformers
7
+ tags:
8
+ - mergekit
9
+ - merge
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ ---
25
+ # Qwen2.5-7B-Instruct-Math-task-arithmetic
26
+
27
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
28
+
29
+ ## Performance
30
+ | Metric |Value|
31
+ |---------------------------------|----:|
32
+ |GSM8k (zero-shot) |91.35|
33
+ |HellaSwag (zero-Shot) |80.01|
34
+ |MBPP (zero-shot) |61.01|
35
+
36
+ ## Merge Details
37
+ ### Merge Method
38
+
39
+ This model was merged using the [Task Arithmetic](https://arxiv.org/abs/2212.04089) merge method using [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) as a base.
40
+
41
+ ### Models Merged
42
+
43
+ The following models were included in the merge:
44
+ * [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
45
+ * [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
46
+
47
+ ### Configuration
48
+
49
+ The following YAML configuration was used to produce this model:
50
+
51
+ ```yaml
52
+ base_model: Qwen/Qwen2.5-7B
53
+ dtype: bfloat16
54
+ merge_method: task_arithmetic
55
+ parameters:
56
+ lambda: 0.7870041304118442
57
+ normalize: 1.0
58
+ slices:
59
+ - sources:
60
+ - layer_range: [0, 28]
61
+ model: Qwen/Qwen2.5-7B
62
+ - layer_range: [0, 28]
63
+ model: Qwen/Qwen2.5-Math-7B
64
+ parameters:
65
+ weight: 0.11841208483160265
66
+ - layer_range: [0, 28]
67
+ model: Qwen/Qwen2.5-7B-Instruct
68
+ parameters:
69
+ weight: 0.7783861791140264
70
+ ```