Fizzarolli commited on
Commit
a312fcf
·
verified ·
1 Parent(s): dfd8c7b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md CHANGED
@@ -143,4 +143,83 @@ special_tokens:
143
  pad_token: <pad>
144
  ```
145
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
  </details>
 
143
  pad_token: <pad>
144
  ```
145
 
146
+ ### Mergekit Config
147
+ ```yaml
148
+ dtype: bfloat16
149
+ merge_method: passthrough
150
+
151
+ slices:
152
+ # untouched intro
153
+ - sources:
154
+ - layer_range: [0, 8]
155
+ model: mistralai/Mistral-Nemo-Base-2407
156
+
157
+ - sources:
158
+ - layer_range: [8, 12]
159
+ model: mistralai/Mistral-Nemo-Base-2407
160
+ # 8–16 baseline
161
+ - sources:
162
+ - layer_range: [8, 16]
163
+ model: mistralai/Mistral-Nemo-Base-2407
164
+ # 8–16 duplicate with projections nulled
165
+ - sources:
166
+ - layer_range: [8, 16]
167
+ model: mistralai/Mistral-Nemo-Base-2407
168
+ parameters:
169
+ scale:
170
+ - filter: o_proj
171
+ value: 0.0
172
+ - filter: down_proj
173
+ value: 0.0
174
+ - value: 1.0
175
+
176
+ # 16–24 duplicate
177
+ - sources:
178
+ - layer_range: [16, 24]
179
+ model: mistralai/Mistral-Nemo-Base-2407
180
+ parameters:
181
+ scale:
182
+ - filter: o_proj
183
+ value: 0.0
184
+ - filter: down_proj
185
+ value: 0.0
186
+ - value: 1.0
187
+ # 16–24 baseline
188
+ - sources:
189
+ - layer_range: [16, 24]
190
+ model: mistralai/Mistral-Nemo-Base-2407
191
+ # 16–24 duplicate
192
+ - sources:
193
+ - layer_range: [16, 24]
194
+ model: mistralai/Mistral-Nemo-Base-2407
195
+ parameters:
196
+ scale:
197
+ - filter: o_proj
198
+ value: 0.0
199
+ - filter: down_proj
200
+ value: 0.0
201
+ - value: 1.0
202
+
203
+ # 24–32 baseline
204
+ - sources:
205
+ - layer_range: [24, 32]
206
+ model: mistralai/Mistral-Nemo-Base-2407
207
+ # 24–32 duplicate
208
+ - sources:
209
+ - layer_range: [24, 32]
210
+ model: mistralai/Mistral-Nemo-Base-2407
211
+ parameters:
212
+ scale:
213
+ - filter: o_proj
214
+ value: 0.0
215
+ - filter: down_proj
216
+ value: 0.0
217
+ - value: 1.0
218
+
219
+ # untouched tail
220
+ - sources:
221
+ - layer_range: [32, 40]
222
+ model: mistralai/Mistral-Nemo-Base-2407
223
+ ```
224
+
225
  </details>