AlekseiPravdin commited on
Commit
6a7e874
·
verified ·
1 Parent(s): bd6cb64

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +57 -0
  2. dare_config.yaml +16 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - merge
5
+ - mergekit
6
+ - lazymergekit
7
+ - mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib
8
+ - VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
9
+ ---
10
+
11
+ # Llama-3.1-8b-instruct_4bitgs64_hqq_calib-Llama-3.1-SauerkrautLM-8b-Instruct-dare-merge
12
+
13
+ Llama-3.1-8b-instruct_4bitgs64_hqq_calib-Llama-3.1-SauerkrautLM-8b-Instruct-dare-merge is a sophisticated language model resulting from the strategic merging of two powerful models: [mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib](https://huggingface.co/mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib) and [VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct). This merge was accomplished using [mergekit](https://github.com/cg123/mergekit), a specialized tool that facilitates precise model blending to optimize performance and synergy between the merged architectures.
14
+
15
+ ## 🧩 Merge Configuration
16
+
17
+ ```yaml
18
+ slices:
19
+ - sources:
20
+ - model: mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib
21
+ layer_range: [0, 31]
22
+ - model: VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
23
+ layer_range: [0, 31]
24
+ merge_method: dare
25
+ base_model: mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib
26
+ parameters:
27
+ t:
28
+ - filter: self_attn
29
+ value: [0, 0.5, 0.3, 0.7, 1]
30
+ - filter: mlp
31
+ value: [1, 0.5, 0.7, 0.3, 0]
32
+ - value: 0.5
33
+ dtype: float16
34
+ ```
35
+
36
+ ## Model Features
37
+
38
+ This merged model combines the advanced capabilities of the HQQ quantized version of Llama-3.1-8B-Instruct with the fine-tuned prowess of the SauerkrautLM variant. The result is a versatile model that excels in both generative tasks and nuanced understanding of multilingual contexts, particularly in German and English. The model is designed to handle a variety of text generation tasks, making it suitable for applications ranging from conversational agents to content creation.
39
+
40
+ ## Evaluation Results
41
+
42
+ The individual models have demonstrated impressive performance across various benchmarks:
43
+
44
+ - **mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib**:
45
+ - ARC (25-shot): 60.49
46
+ - HellaSwag (10-shot): 80.16
47
+ - MMLU (5-shot): 68.98
48
+ - Average performance: 69.51
49
+
50
+ - **VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct**:
51
+ - Fine-tuned on German-English data, showcasing significant improvements in multilingual capabilities.
52
+
53
+ The merged model inherits the strengths of both parent models, providing enhanced performance in multilingual contexts while maintaining the efficiency of the HQQ quantization.
54
+
55
+ ## Limitations
56
+
57
+ While the merged model benefits from the strengths of both parent models, it may also carry over some limitations. For instance, the potential for uncensored content remains a concern, as noted in the SauerkrautLM documentation. Additionally, the model's performance may vary depending on the specific task and the quality of the input data. Users should be aware of these factors when deploying the model in real-world applications.
dare_config.yaml ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ slices:
2
+ - sources:
3
+ - model: mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib
4
+ layer_range: [0, 31]
5
+ - model: VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
6
+ layer_range: [0, 31]
7
+ merge_method: dare
8
+ base_model: mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib
9
+ parameters:
10
+ t:
11
+ - filter: self_attn
12
+ value: [0, 0.5, 0.3, 0.7, 1]
13
+ - filter: mlp
14
+ value: [1, 0.5, 0.7, 0.3, 0]
15
+ - value: 0.5
16
+ dtype: float16