Files changed (1) hide show
  1. README.md +85 -73
README.md CHANGED
@@ -1,74 +1,86 @@
1
- ---
2
- base_model:
3
- - CultriX/SeQwence-14B
4
- - VAGOsolutions/SauerkrautLM-v2-14b-DPO
5
- - v000000/Qwen2.5-Lumen-14B
6
- - CultriX/Qwen2.5-14B-Wernicke
7
- - Qwen/Qwen2.5-14B
8
- - CultriX/Qwen2.5-14B-MegaMerge-pt2
9
- library_name: transformers
10
- tags:
11
- - mergekit
12
- - merge
13
- license: apache-2.0
14
- language:
15
- - en
16
- metrics:
17
- - accuracy
18
- pipeline_tag: text-generation
19
- ---
20
- # merge
21
-
22
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
23
-
24
- ## Merge Details
25
- ### Merge Method
26
-
27
- This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) as a base.
28
-
29
- ### Models Merged
30
-
31
- The following models were included in the merge:
32
- * [CultriX/SeQwence-14B](https://huggingface.co/CultriX/SeQwence-14B)
33
- * [VAGOsolutions/SauerkrautLM-v2-14b-DPO](https://huggingface.co/VAGOsolutions/SauerkrautLM-v2-14b-DPO)
34
- * [v000000/Qwen2.5-Lumen-14B](https://huggingface.co/v000000/Qwen2.5-Lumen-14B)
35
- * [CultriX/Qwen2.5-14B-Wernicke](https://huggingface.co/CultriX/Qwen2.5-14B-Wernicke)
36
- * [CultriX/Qwen2.5-14B-MegaMerge-pt2](https://huggingface.co/CultriX/Qwen2.5-14B-MegaMerge-pt2)
37
-
38
- ### Configuration
39
-
40
- The following YAML configuration was used to produce this model:
41
-
42
- ```yaml
43
-
44
- models:
45
- - model: CultriX/Qwen2.5-14B-Wernicke
46
- parameters:
47
- weight: 0.35 # Strong performance in GPQA, MUSR, and MMLU-PRO
48
- density: 0.6 # Retain 60% of significant parameters
49
- - model: VAGOsolutions/SauerkrautLM-v2-14b-DPO
50
- parameters:
51
- weight: 0.30 # Exceptional IFEval and MATH Level 5 capabilities
52
- density: 0.6 # Retain 60% of significant parameters
53
- - model: CultriX/Qwen2.5-14B-MegaMerge-pt2
54
- parameters:
55
- weight: 0.20 # Balanced contributions to Truthful QA and MMLU
56
- density: 0.5 # Retain 50% of significant parameters
57
- - model: CultriX/SeQwence-14B
58
- parameters:
59
- weight: 0.15 # Provides diverse data and generalization
60
- density: 0.4 # Retain 40% of significant parameters
61
- - model: v000000/Qwen2.5-Lumen-14B
62
- parameters:
63
- weight: 0.10 # Enhances creative and narrative tasks
64
- density: 0.5 # Retain 50% for task diversity
65
- base_model: Qwen/Qwen2.5-14B
66
- merge_method: dare_ties
67
- parameters:
68
- normalize: true # Ensures parameter scaling compatibility
69
- int8_mask: true # Optimizes memory and computational efficiency
70
- dtype: bfloat16
71
- tokenizer_source: Qwen/Qwen2.5-14B-Instruct
72
-
73
-
 
 
 
 
 
 
 
 
 
 
 
 
74
  ```
 
1
+ ---
2
+ base_model:
3
+ - CultriX/SeQwence-14B
4
+ - VAGOsolutions/SauerkrautLM-v2-14b-DPO
5
+ - v000000/Qwen2.5-Lumen-14B
6
+ - CultriX/Qwen2.5-14B-Wernicke
7
+ - Qwen/Qwen2.5-14B
8
+ - CultriX/Qwen2.5-14B-MegaMerge-pt2
9
+ library_name: transformers
10
+ tags:
11
+ - mergekit
12
+ - merge
13
+ license: apache-2.0
14
+ language:
15
+ - zho
16
+ - eng
17
+ - fra
18
+ - spa
19
+ - por
20
+ - deu
21
+ - ita
22
+ - rus
23
+ - jpn
24
+ - kor
25
+ - vie
26
+ - tha
27
+ - ara
28
+ metrics:
29
+ - accuracy
30
+ pipeline_tag: text-generation
31
+ ---
32
+ # merge
33
+
34
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
35
+
36
+ ## Merge Details
37
+ ### Merge Method
38
+
39
+ This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) as a base.
40
+
41
+ ### Models Merged
42
+
43
+ The following models were included in the merge:
44
+ * [CultriX/SeQwence-14B](https://huggingface.co/CultriX/SeQwence-14B)
45
+ * [VAGOsolutions/SauerkrautLM-v2-14b-DPO](https://huggingface.co/VAGOsolutions/SauerkrautLM-v2-14b-DPO)
46
+ * [v000000/Qwen2.5-Lumen-14B](https://huggingface.co/v000000/Qwen2.5-Lumen-14B)
47
+ * [CultriX/Qwen2.5-14B-Wernicke](https://huggingface.co/CultriX/Qwen2.5-14B-Wernicke)
48
+ * [CultriX/Qwen2.5-14B-MegaMerge-pt2](https://huggingface.co/CultriX/Qwen2.5-14B-MegaMerge-pt2)
49
+
50
+ ### Configuration
51
+
52
+ The following YAML configuration was used to produce this model:
53
+
54
+ ```yaml
55
+
56
+ models:
57
+ - model: CultriX/Qwen2.5-14B-Wernicke
58
+ parameters:
59
+ weight: 0.35 # Strong performance in GPQA, MUSR, and MMLU-PRO
60
+ density: 0.6 # Retain 60% of significant parameters
61
+ - model: VAGOsolutions/SauerkrautLM-v2-14b-DPO
62
+ parameters:
63
+ weight: 0.30 # Exceptional IFEval and MATH Level 5 capabilities
64
+ density: 0.6 # Retain 60% of significant parameters
65
+ - model: CultriX/Qwen2.5-14B-MegaMerge-pt2
66
+ parameters:
67
+ weight: 0.20 # Balanced contributions to Truthful QA and MMLU
68
+ density: 0.5 # Retain 50% of significant parameters
69
+ - model: CultriX/SeQwence-14B
70
+ parameters:
71
+ weight: 0.15 # Provides diverse data and generalization
72
+ density: 0.4 # Retain 40% of significant parameters
73
+ - model: v000000/Qwen2.5-Lumen-14B
74
+ parameters:
75
+ weight: 0.10 # Enhances creative and narrative tasks
76
+ density: 0.5 # Retain 50% for task diversity
77
+ base_model: Qwen/Qwen2.5-14B
78
+ merge_method: dare_ties
79
+ parameters:
80
+ normalize: true # Ensures parameter scaling compatibility
81
+ int8_mask: true # Optimizes memory and computational efficiency
82
+ dtype: bfloat16
83
+ tokenizer_source: Qwen/Qwen2.5-14B-Instruct
84
+
85
+
86
  ```