Casual-Autopsy commited on
Commit
d9cb6ff
·
verified ·
1 Parent(s): 6ea10f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -23
README.md CHANGED
@@ -1,11 +1,13 @@
1
  ---
2
  base_model:
3
- - Casual-Autopsy/Llama-3-Shisa-Minus-Base
4
- - Casual-Autopsy/Llama-3-Youko-Minus-Base
5
- - Casual-Autopsy/Llama-3-Minus-Base
6
- - Casual-Autopsy/Llama-3-Yollow-SCE-TopK_1.0
7
- - Casual-Autopsy/vntl-qlora
8
- - Casual-Autopsy/Llama-3-Swallow-Minus-Base
 
 
9
  library_name: transformers
10
  tags:
11
  - mergekit
@@ -16,48 +18,123 @@ tags:
16
 
17
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
18
 
19
- ## Merge Details
20
- ### Merge Method
21
 
22
- This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Casual-Autopsy/Llama-3-Yollow-SCE-TopK_1.0](https://huggingface.co/Casual-Autopsy/Llama-3-Yollow-SCE-TopK_1.0) + [Casual-Autopsy/vntl-qlora](https://huggingface.co/Casual-Autopsy/vntl-qlora) as a base.
23
 
24
- ### Models Merged
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- The following models were included in the merge:
27
- * [Casual-Autopsy/Llama-3-Shisa-Minus-Base](https://huggingface.co/Casual-Autopsy/Llama-3-Shisa-Minus-Base)
28
- * [Casual-Autopsy/Llama-3-Youko-Minus-Base](https://huggingface.co/Casual-Autopsy/Llama-3-Youko-Minus-Base)
29
- * [Casual-Autopsy/Llama-3-Minus-Base](https://huggingface.co/Casual-Autopsy/Llama-3-Minus-Base)
30
- * [Casual-Autopsy/Llama-3-Swallow-Minus-Base](https://huggingface.co/Casual-Autopsy/Llama-3-Swallow-Minus-Base)
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
- ### Configuration
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
- The following YAML configuration was used to produce this model:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
 
36
  ```yaml
37
  models:
38
  # Base
39
- - model: Casual-Autopsy/Llama-3-Yollow-SCE-TopK_1.0+Casual-Autopsy/vntl-qlora
40
  parameters:
41
  weight: 1.0
42
  # Models
43
- - model: Casual-Autopsy/Llama-3-Minus-Base
44
  parameters:
45
  density: 0.35
46
  weight: 10e-5
47
- - model: Casual-Autopsy/Llama-3-Shisa-Minus-Base
48
  parameters:
49
  density: 0.85
50
  weight: 25e-5
51
- - model: Casual-Autopsy/Llama-3-Swallow-Minus-Base
52
  parameters:
53
  density: 0.85
54
  weight: 25e-5
55
- - model: Casual-Autopsy/Llama-3-Youko-Minus-Base
56
  parameters:
57
  density: 0.85
58
  weight: 25e-5
59
  merge_method: ties
60
- base_model: Casual-Autopsy/Llama-3-Yollow-SCE-TopK_1.0+Casual-Autopsy/vntl-qlora
61
  parameters:
62
  normalize: false
63
  int8_mask: false
 
1
  ---
2
  base_model:
3
+ - meta-llama/Meta-Llama-3-8B
4
+ - meta-llama/Meta-Llama-3-8B-Instruct
5
+ - rinna/llama-3-youko-8b
6
+ - rinna/llama-3-youko-8b-instruct
7
+ - tokyotech-llm/Llama-3-Swallow-8B-v0.1
8
+ - tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
9
+ - shisa-ai/shisa-v1-llama3-8b
10
+ - lmg-anon/vntl-llama3-8b-v2-qlora
11
  library_name: transformers
12
  tags:
13
  - mergekit
 
18
 
19
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
20
 
21
+ ## Configuration
 
22
 
23
+ The following YAML configuration was used to produce this model:
24
 
25
+ ### Llama-3-Yollow-8B
26
+ ```yaml
27
+ models:
28
+ # Pivot model
29
+ - model: meta-llama/Meta-Llama-3-8B
30
+ # Target models
31
+ - model: rinna/llama-3-youko-8b
32
+ - model: tokyotech-llm/Llama-3-Swallow-8B-v0.1
33
+ merge_method: sce
34
+ base_model: meta-llama/Meta-Llama-3-8B
35
+ parameters:
36
+ select_topk: 1.0
37
+ dtype: float32
38
+ ```
39
+
40
+ ### Llama-3-Minus-Base-8B
41
+ ```yaml
42
+ models:
43
+ # Finetune model
44
+ - model: meta-llama/Meta-Llama-3-8B-Instruct
45
+ parameters:
46
+ weight: 1.0
47
+ # Base model
48
+ - model: meta-llama/Meta-Llama-3-8B
49
+ parameters:
50
+ weight: -1.0
51
+ merge_method: task_arithmetic
52
+ base_model: meta-llama/Meta-Llama-3-8B-Instruct
53
+ parameters:
54
+ normalize: false
55
+ dtype: float32
56
+ ```
57
 
58
+ ### Llama-3-Youko-Minus-Base-8B
59
+ ```yaml
60
+ models:
61
+ # Finetune model
62
+ - model: rinna/llama-3-youko-8b-instruct
63
+ parameters:
64
+ weight: 1.0
65
+ # Base model
66
+ - model: meta-llama/Meta-Llama-3-8B
67
+ parameters:
68
+ weight: -1.0
69
+ merge_method: task_arithmetic
70
+ base_model: rinna/llama-3-youko-8b-instruct
71
+ parameters:
72
+ normalize: false
73
+ dtype: float32
74
+ ```
75
 
76
+ ### Llama-3-Swallow-Minus-Base-8B
77
+ ```yaml
78
+ models:
79
+ # Finetune model
80
+ - model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
81
+ parameters:
82
+ weight: 1.0
83
+ # Base model
84
+ - model: meta-llama/Meta-Llama-3-8B
85
+ parameters:
86
+ weight: -1.0
87
+ merge_method: task_arithmetic
88
+ base_model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
89
+ parameters:
90
+ normalize: false
91
+ dtype: float32
92
+ ```
93
 
94
+ ### Llama-3-Shisa-Minus-Base-8B
95
+ ```yaml
96
+ models:
97
+ # Finetune model
98
+ - model: shisa-ai/shisa-v1-llama3-8b
99
+ parameters:
100
+ weight: 1.0
101
+ # Base model
102
+ - model: meta-llama/Meta-Llama-3-8B
103
+ parameters:
104
+ weight: -1.0
105
+ merge_method: task_arithmetic
106
+ base_model: shisa-ai/shisa-v1-llama3-8b
107
+ parameters:
108
+ normalize: false
109
+ dtype: float32
110
+ ```
111
 
112
+ ### Llama-3-VNTL-Yollisa-8B
113
  ```yaml
114
  models:
115
  # Base
116
+ - model: Casual-Autopsy/Llama-3-Yollow-8B+lmg-anon/vntl-llama3-8b-v2-qlora
117
  parameters:
118
  weight: 1.0
119
  # Models
120
+ - model: Casual-Autopsy/Llama-3-Minus-Base-8B
121
  parameters:
122
  density: 0.35
123
  weight: 10e-5
124
+ - model: Casual-Autopsy/Llama-3-Shisa-Minus-Base-8B
125
  parameters:
126
  density: 0.85
127
  weight: 25e-5
128
+ - model: Casual-Autopsy/Llama-3-Swallow-Minus-Base-8B
129
  parameters:
130
  density: 0.85
131
  weight: 25e-5
132
+ - model: Casual-Autopsy/Llama-3-Youko-Minus-Base-8B
133
  parameters:
134
  density: 0.85
135
  weight: 25e-5
136
  merge_method: ties
137
+ base_model: Casual-Autopsy/Llama-3-Yollow-8B+lmg-anon/vntl-llama3-8b-v2-qlora
138
  parameters:
139
  normalize: false
140
  int8_mask: false