lbourdois commited on
Commit
10bdd9c
·
verified ·
1 Parent(s): a5337a4

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +72 -72
README.md CHANGED
@@ -1,72 +1,72 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-7B-Instruct
4
- - Qwen/Qwen2.5-7B
5
- - Qwen/Qwen2.5-Math-7B
6
- library_name: transformers
7
- tags:
8
- - mergekit
9
- - merge
10
- language:
11
- - zho
12
- - eng
13
- - fra
14
- - spa
15
- - por
16
- - deu
17
- - ita
18
- - rus
19
- - jpn
20
- - kor
21
- - vie
22
- - tha
23
- - ara
24
- ---
25
- # Qwen2.5-7B-Instruct-Math-dare-linear
26
-
27
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
28
-
29
- ## Performance
30
- | Metric |Value|
31
- |---------------------------------|----:|
32
- |GSM8k (zero-shot) |90.75|
33
- |HellaSwag (zero-Shot) |80.77|
34
- |MBPP (zero-shot) |63.08|
35
-
36
- ## Merge Details
37
- ### Merge Method
38
-
39
- This model was merged using the [Linear DARE](https://arxiv.org/abs/2311.03099) merge method using [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) as a base.
40
-
41
- ### Models Merged
42
-
43
- The following models were included in the merge:
44
- * [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
45
- * [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
46
-
47
- ### Configuration
48
-
49
- The following YAML configuration was used to produce this model:
50
-
51
- ```yaml
52
- base_model: Qwen/Qwen2.5-7B
53
- dtype: bfloat16
54
- merge_method: dare_linear
55
- parameters:
56
- lambda: 0.7484721287441042
57
- normalize: 1.0
58
- slices:
59
- - sources:
60
- - layer_range: [0, 28]
61
- model: Qwen/Qwen2.5-7B
62
- - layer_range: [0, 28]
63
- model: Qwen/Qwen2.5-Math-7B
64
- parameters:
65
- density: 0.8456557088847347
66
- weight: 0.11064925820848412
67
- - layer_range: [0, 28]
68
- model: Qwen/Qwen2.5-7B-Instruct
69
- parameters:
70
- density: 0.5247829319933462
71
- weight: 0.6901952279079901
72
- ```
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-7B-Instruct
4
+ - Qwen/Qwen2.5-7B
5
+ - Qwen/Qwen2.5-Math-7B
6
+ library_name: transformers
7
+ tags:
8
+ - mergekit
9
+ - merge
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ ---
25
+ # Qwen2.5-7B-Instruct-Math-dare-linear
26
+
27
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
28
+
29
+ ## Performance
30
+ | Metric |Value|
31
+ |---------------------------------|----:|
32
+ |GSM8k (zero-shot) |90.75|
33
+ |HellaSwag (zero-Shot) |80.77|
34
+ |MBPP (zero-shot) |63.08|
35
+
36
+ ## Merge Details
37
+ ### Merge Method
38
+
39
+ This model was merged using the [Linear DARE](https://arxiv.org/abs/2311.03099) merge method using [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) as a base.
40
+
41
+ ### Models Merged
42
+
43
+ The following models were included in the merge:
44
+ * [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
45
+ * [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
46
+
47
+ ### Configuration
48
+
49
+ The following YAML configuration was used to produce this model:
50
+
51
+ ```yaml
52
+ base_model: Qwen/Qwen2.5-7B
53
+ dtype: bfloat16
54
+ merge_method: dare_linear
55
+ parameters:
56
+ lambda: 0.7484721287441042
57
+ normalize: 1.0
58
+ slices:
59
+ - sources:
60
+ - layer_range: [0, 28]
61
+ model: Qwen/Qwen2.5-7B
62
+ - layer_range: [0, 28]
63
+ model: Qwen/Qwen2.5-Math-7B
64
+ parameters:
65
+ density: 0.8456557088847347
66
+ weight: 0.11064925820848412
67
+ - layer_range: [0, 28]
68
+ model: Qwen/Qwen2.5-7B-Instruct
69
+ parameters:
70
+ density: 0.5247829319933462
71
+ weight: 0.6901952279079901
72
+ ```