mlabonne lbourdois commited on
Commit
d77b3cb
·
verified ·
1 Parent(s): c2c8370

Improve language tag (#3)

Browse files

- Improve language tag (01714fc8d10f2f838b67a6263921b089d6da502b)


Co-authored-by: Loïck BOURDOIS <[email protected]>

Files changed (1) hide show
  1. README.md +341 -335
README.md CHANGED
@@ -1,336 +1,342 @@
1
- ---
2
- language:
3
- - en
4
- license: apache-2.0
5
- library_name: transformers
6
- tags:
7
- - mergekit
8
- - merge
9
- - lazymergekit
10
- base_model:
11
- - Qwen/Qwen2.5-32B-Instruct
12
- license_name: tongyi-qianwen
13
- license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
14
- pipeline_tag: text-generation
15
- model-index:
16
- - name: BigQwen2.5-Echo-47B-Instruct
17
- results:
18
- - task:
19
- type: text-generation
20
- name: Text Generation
21
- dataset:
22
- name: IFEval (0-Shot)
23
- type: HuggingFaceH4/ifeval
24
- args:
25
- num_few_shot: 0
26
- metrics:
27
- - type: inst_level_strict_acc and prompt_level_strict_acc
28
- value: 73.57
29
- name: strict accuracy
30
- source:
31
- url: >-
32
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-Echo-47B-Instruct
33
- name: Open LLM Leaderboard
34
- - task:
35
- type: text-generation
36
- name: Text Generation
37
- dataset:
38
- name: BBH (3-Shot)
39
- type: BBH
40
- args:
41
- num_few_shot: 3
42
- metrics:
43
- - type: acc_norm
44
- value: 44.52
45
- name: normalized accuracy
46
- source:
47
- url: >-
48
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-Echo-47B-Instruct
49
- name: Open LLM Leaderboard
50
- - task:
51
- type: text-generation
52
- name: Text Generation
53
- dataset:
54
- name: MATH Lvl 5 (4-Shot)
55
- type: hendrycks/competition_math
56
- args:
57
- num_few_shot: 4
58
- metrics:
59
- - type: exact_match
60
- value: 3.47
61
- name: exact match
62
- source:
63
- url: >-
64
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-Echo-47B-Instruct
65
- name: Open LLM Leaderboard
66
- - task:
67
- type: text-generation
68
- name: Text Generation
69
- dataset:
70
- name: GPQA (0-shot)
71
- type: Idavidrein/gpqa
72
- args:
73
- num_few_shot: 0
74
- metrics:
75
- - type: acc_norm
76
- value: 8.61
77
- name: acc_norm
78
- source:
79
- url: >-
80
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-Echo-47B-Instruct
81
- name: Open LLM Leaderboard
82
- - task:
83
- type: text-generation
84
- name: Text Generation
85
- dataset:
86
- name: MuSR (0-shot)
87
- type: TAUR-Lab/MuSR
88
- args:
89
- num_few_shot: 0
90
- metrics:
91
- - type: acc_norm
92
- value: 10.19
93
- name: acc_norm
94
- source:
95
- url: >-
96
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-Echo-47B-Instruct
97
- name: Open LLM Leaderboard
98
- - task:
99
- type: text-generation
100
- name: Text Generation
101
- dataset:
102
- name: MMLU-PRO (5-shot)
103
- type: TIGER-Lab/MMLU-Pro
104
- config: main
105
- split: test
106
- args:
107
- num_few_shot: 5
108
- metrics:
109
- - type: acc
110
- value: 41.49
111
- name: accuracy
112
- source:
113
- url: >-
114
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-Echo-47B-Instruct
115
- name: Open LLM Leaderboard
116
- ---
117
-
118
- # BigQwen2.5-Echo-47B-Instruct
119
-
120
- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/98GiKtmH1AtHHbIbOUH4Y.jpeg)
121
-
122
- BigQwen2.5-Echo-47B-Instruct is a [Qwen/Qwen2-32B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
123
-
124
- ## 🔉 Echo Merge
125
-
126
- I've tried a more gradual approach with a **distributed repetition pattern**. Instead of replicating blocks of 8 or more layers, I'm replicating individual layers in these blocks:
127
- - First 8 layers: No replication
128
- - Next 8 layers: Replicate 2 layers (first one, middle one)
129
- - Next 8 layers: Replicate 4 layers (1st, 3rd, 5th, 7th)
130
- - Next 8 layers: Replicate 8 layers (all of them)
131
- - Next 8 layers: Replicate 4 layers (1st, 3rd, 5th, 7th)
132
- - Next 8 layers: Replicate 2 layers (first one, middle one)
133
- - First 8 layers: No replication
134
-
135
- I used this string to visualize it, where 0 are original layers and 1 duplicated ones (the order doesn't matter):
136
- ```
137
- 00000000 1000010000 100100100100 1010101010101010 1010101010101010 100100100100 1000010000 00000000
138
- ```
139
-
140
- The main idea is that the input/output difference of middle layers is quite small, so replicating a middle layer has a small impact on the output.
141
- The additional layers are designed to increase the model's capacity without breaking the information flow, which often creates "insane" self-merges.
142
-
143
- ## 🏆 Evaluation
144
-
145
- | Metric |**BigQwen2.5-Echo-47B-Instruct**|BigQwen2.5-52B-Instruct|Qwen2.5-32B-Instruct|
146
- |-------------------|----:|----:|----:|
147
- |Avg. |30.31|37.42|36.17|
148
- |IFEval (0-Shot) |73.57|79.29|83.46|
149
- |BBH (3-Shot) |44.52|59.81|56.49|
150
- |MATH Lvl 5 (4-Shot)| 3.47|17.82|0|
151
- |GPQA (0-shot) | 8.61| 6.94|11.74|
152
- |MuSR (0-shot) |10.19|10.45|13.5|
153
- |MMLU-PRO (5-shot) |41.49|50.22|51.85|
154
-
155
- ## 🧩 Configuration
156
-
157
- The following YAML configuration was used to produce this model:
158
-
159
- ```yaml
160
- slices:
161
- # First 8 layers: No replication
162
- - sources:
163
- - model: Qwen/Qwen2.5-32B-Instruct
164
- layer_range: [0, 8]
165
-
166
- # Next 8 layers: Replicate 2 layers
167
- - sources:
168
- - model: Qwen/Qwen2.5-32B-Instruct
169
- layer_range: [8, 9]
170
- - sources:
171
- - model: Qwen/Qwen2.5-32B-Instruct
172
- layer_range: [8, 9]
173
- - sources:
174
- - model: Qwen/Qwen2.5-32B-Instruct
175
- layer_range: [9, 13]
176
- - sources:
177
- - model: Qwen/Qwen2.5-32B-Instruct
178
- layer_range: [13, 14]
179
- - sources:
180
- - model: Qwen/Qwen2.5-32B-Instruct
181
- layer_range: [13, 14]
182
- - sources:
183
- - model: Qwen/Qwen2.5-32B-Instruct
184
- layer_range: [14, 16]
185
-
186
- # Next 8 layers: Replicate 4 layers
187
- - sources:
188
- - model: Qwen/Qwen2.5-32B-Instruct
189
- layer_range: [16, 18]
190
- - sources:
191
- - model: Qwen/Qwen2.5-32B-Instruct
192
- layer_range: [17, 19]
193
- - sources:
194
- - model: Qwen/Qwen2.5-32B-Instruct
195
- layer_range: [18, 20]
196
- - sources:
197
- - model: Qwen/Qwen2.5-32B-Instruct
198
- layer_range: [19, 21]
199
- - sources:
200
- - model: Qwen/Qwen2.5-32B-Instruct
201
- layer_range: [20, 22]
202
- - sources:
203
- - model: Qwen/Qwen2.5-32B-Instruct
204
- layer_range: [21, 23]
205
- - sources:
206
- - model: Qwen/Qwen2.5-32B-Instruct
207
- layer_range: [22, 24]
208
-
209
- # Next 8 layers: Replicate all 8 layers
210
- - sources:
211
- - model: Qwen/Qwen2.5-32B-Instruct
212
- layer_range: [24, 25]
213
- - sources:
214
- - model: Qwen/Qwen2.5-32B-Instruct
215
- layer_range: [24, 26]
216
- - sources:
217
- - model: Qwen/Qwen2.5-32B-Instruct
218
- layer_range: [25, 27]
219
- - sources:
220
- - model: Qwen/Qwen2.5-32B-Instruct
221
- layer_range: [26, 28]
222
- - sources:
223
- - model: Qwen/Qwen2.5-32B-Instruct
224
- layer_range: [27, 29]
225
- - sources:
226
- - model: Qwen/Qwen2.5-32B-Instruct
227
- layer_range: [28, 30]
228
- - sources:
229
- - model: Qwen/Qwen2.5-32B-Instruct
230
- layer_range: [29, 31]
231
- - sources:
232
- - model: Qwen/Qwen2.5-32B-Instruct
233
- layer_range: [30, 32]
234
-
235
- # Middle 8 layers: Replicate all 8 layers
236
- - sources:
237
- - model: Qwen/Qwen2.5-32B-Instruct
238
- layer_range: [32, 33]
239
- - sources:
240
- - model: Qwen/Qwen2.5-32B-Instruct
241
- layer_range: [32, 34]
242
- - sources:
243
- - model: Qwen/Qwen2.5-32B-Instruct
244
- layer_range: [33, 35]
245
- - sources:
246
- - model: Qwen/Qwen2.5-32B-Instruct
247
- layer_range: [34, 36]
248
- - sources:
249
- - model: Qwen/Qwen2.5-32B-Instruct
250
- layer_range: [35, 37]
251
- - sources:
252
- - model: Qwen/Qwen2.5-32B-Instruct
253
- layer_range: [36, 38]
254
- - sources:
255
- - model: Qwen/Qwen2.5-32B-Instruct
256
- layer_range: [37, 39]
257
- - sources:
258
- - model: Qwen/Qwen2.5-32B-Instruct
259
- layer_range: [38, 40]
260
-
261
- # Next 8 layers: Replicate 4 layers
262
- - sources:
263
- - model: Qwen/Qwen2.5-32B-Instruct
264
- layer_range: [40, 42]
265
- - sources:
266
- - model: Qwen/Qwen2.5-32B-Instruct
267
- layer_range: [41, 43]
268
- - sources:
269
- - model: Qwen/Qwen2.5-32B-Instruct
270
- layer_range: [42, 44]
271
- - sources:
272
- - model: Qwen/Qwen2.5-32B-Instruct
273
- layer_range: [43, 45]
274
- - sources:
275
- - model: Qwen/Qwen2.5-32B-Instruct
276
- layer_range: [44, 46]
277
- - sources:
278
- - model: Qwen/Qwen2.5-32B-Instruct
279
- layer_range: [45, 47]
280
- - sources:
281
- - model: Qwen/Qwen2.5-32B-Instruct
282
- layer_range: [46, 48]
283
-
284
- # Next 8 layers: Replicate 2 layers
285
- - sources:
286
- - model: Qwen/Qwen2.5-32B-Instruct
287
- layer_range: [48, 49]
288
- - sources:
289
- - model: Qwen/Qwen2.5-32B-Instruct
290
- layer_range: [48, 49]
291
- - sources:
292
- - model: Qwen/Qwen2.5-32B-Instruct
293
- layer_range: [49, 53]
294
- - sources:
295
- - model: Qwen/Qwen2.5-32B-Instruct
296
- layer_range: [53, 54]
297
- - sources:
298
- - model: Qwen/Qwen2.5-32B-Instruct
299
- layer_range: [53, 54]
300
- - sources:
301
- - model: Qwen/Qwen2.5-32B-Instruct
302
- layer_range: [54, 56]
303
-
304
- # Last 8 layers: No replication
305
- - sources:
306
- - model: Qwen/Qwen2.5-32B-Instruct
307
- layer_range: [56, 64]
308
-
309
- merge_method: passthrough
310
- dtype: bfloat16
311
- ```
312
-
313
- ## 💻 Usage
314
-
315
- ```python
316
- !pip install -qU transformers accelerate
317
-
318
- from transformers import AutoTokenizer
319
- import transformers
320
- import torch
321
-
322
- model = "mlabonne/BigQwen2.5-Echo-47B-Instruct"
323
- messages = [{"role": "user", "content": "What is a large language model?"}]
324
-
325
- tokenizer = AutoTokenizer.from_pretrained(model)
326
- prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
327
- pipeline = transformers.pipeline(
328
- "text-generation",
329
- model=model,
330
- torch_dtype=torch.float16,
331
- device_map="auto",
332
- )
333
-
334
- outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
335
- print(outputs[0]["generated_text"])
 
 
 
 
 
 
336
  ```
 
1
+ ---
2
+ language:
3
+ - zho
4
+ - eng
5
+ - fra
6
+ - spa
7
+ - por
8
+ - deu
9
+ - ita
10
+ - rus
11
+ - jpn
12
+ - kor
13
+ - vie
14
+ - tha
15
+ - ara
16
+ license: apache-2.0
17
+ library_name: transformers
18
+ tags:
19
+ - mergekit
20
+ - merge
21
+ - lazymergekit
22
+ base_model:
23
+ - Qwen/Qwen2.5-32B-Instruct
24
+ license_name: tongyi-qianwen
25
+ license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
26
+ pipeline_tag: text-generation
27
+ model-index:
28
+ - name: BigQwen2.5-Echo-47B-Instruct
29
+ results:
30
+ - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ name: IFEval (0-Shot)
35
+ type: HuggingFaceH4/ifeval
36
+ args:
37
+ num_few_shot: 0
38
+ metrics:
39
+ - type: inst_level_strict_acc and prompt_level_strict_acc
40
+ value: 73.57
41
+ name: strict accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-Echo-47B-Instruct
44
+ name: Open LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: BBH (3-Shot)
50
+ type: BBH
51
+ args:
52
+ num_few_shot: 3
53
+ metrics:
54
+ - type: acc_norm
55
+ value: 44.52
56
+ name: normalized accuracy
57
+ source:
58
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-Echo-47B-Instruct
59
+ name: Open LLM Leaderboard
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: MATH Lvl 5 (4-Shot)
65
+ type: hendrycks/competition_math
66
+ args:
67
+ num_few_shot: 4
68
+ metrics:
69
+ - type: exact_match
70
+ value: 3.47
71
+ name: exact match
72
+ source:
73
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-Echo-47B-Instruct
74
+ name: Open LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: GPQA (0-shot)
80
+ type: Idavidrein/gpqa
81
+ args:
82
+ num_few_shot: 0
83
+ metrics:
84
+ - type: acc_norm
85
+ value: 8.61
86
+ name: acc_norm
87
+ source:
88
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-Echo-47B-Instruct
89
+ name: Open LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: MuSR (0-shot)
95
+ type: TAUR-Lab/MuSR
96
+ args:
97
+ num_few_shot: 0
98
+ metrics:
99
+ - type: acc_norm
100
+ value: 10.19
101
+ name: acc_norm
102
+ source:
103
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-Echo-47B-Instruct
104
+ name: Open LLM Leaderboard
105
+ - task:
106
+ type: text-generation
107
+ name: Text Generation
108
+ dataset:
109
+ name: MMLU-PRO (5-shot)
110
+ type: TIGER-Lab/MMLU-Pro
111
+ config: main
112
+ split: test
113
+ args:
114
+ num_few_shot: 5
115
+ metrics:
116
+ - type: acc
117
+ value: 41.49
118
+ name: accuracy
119
+ source:
120
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-Echo-47B-Instruct
121
+ name: Open LLM Leaderboard
122
+ ---
123
+
124
+ # BigQwen2.5-Echo-47B-Instruct
125
+
126
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/98GiKtmH1AtHHbIbOUH4Y.jpeg)
127
+
128
+ BigQwen2.5-Echo-47B-Instruct is a [Qwen/Qwen2-32B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
129
+
130
+ ## 🔉 Echo Merge
131
+
132
+ I've tried a more gradual approach with a **distributed repetition pattern**. Instead of replicating blocks of 8 or more layers, I'm replicating individual layers in these blocks:
133
+ - First 8 layers: No replication
134
+ - Next 8 layers: Replicate 2 layers (first one, middle one)
135
+ - Next 8 layers: Replicate 4 layers (1st, 3rd, 5th, 7th)
136
+ - Next 8 layers: Replicate 8 layers (all of them)
137
+ - Next 8 layers: Replicate 4 layers (1st, 3rd, 5th, 7th)
138
+ - Next 8 layers: Replicate 2 layers (first one, middle one)
139
+ - First 8 layers: No replication
140
+
141
+ I used this string to visualize it, where 0 are original layers and 1 duplicated ones (the order doesn't matter):
142
+ ```
143
+ 00000000 1000010000 100100100100 1010101010101010 1010101010101010 100100100100 1000010000 00000000
144
+ ```
145
+
146
+ The main idea is that the input/output difference of middle layers is quite small, so replicating a middle layer has a small impact on the output.
147
+ The additional layers are designed to increase the model's capacity without breaking the information flow, which often creates "insane" self-merges.
148
+
149
+ ## 🏆 Evaluation
150
+
151
+ | Metric |**BigQwen2.5-Echo-47B-Instruct**|BigQwen2.5-52B-Instruct|Qwen2.5-32B-Instruct|
152
+ |-------------------|----:|----:|----:|
153
+ |Avg. |30.31|37.42|36.17|
154
+ |IFEval (0-Shot) |73.57|79.29|83.46|
155
+ |BBH (3-Shot) |44.52|59.81|56.49|
156
+ |MATH Lvl 5 (4-Shot)| 3.47|17.82|0|
157
+ |GPQA (0-shot) | 8.61| 6.94|11.74|
158
+ |MuSR (0-shot) |10.19|10.45|13.5|
159
+ |MMLU-PRO (5-shot) |41.49|50.22|51.85|
160
+
161
+ ## 🧩 Configuration
162
+
163
+ The following YAML configuration was used to produce this model:
164
+
165
+ ```yaml
166
+ slices:
167
+ # First 8 layers: No replication
168
+ - sources:
169
+ - model: Qwen/Qwen2.5-32B-Instruct
170
+ layer_range: [0, 8]
171
+
172
+ # Next 8 layers: Replicate 2 layers
173
+ - sources:
174
+ - model: Qwen/Qwen2.5-32B-Instruct
175
+ layer_range: [8, 9]
176
+ - sources:
177
+ - model: Qwen/Qwen2.5-32B-Instruct
178
+ layer_range: [8, 9]
179
+ - sources:
180
+ - model: Qwen/Qwen2.5-32B-Instruct
181
+ layer_range: [9, 13]
182
+ - sources:
183
+ - model: Qwen/Qwen2.5-32B-Instruct
184
+ layer_range: [13, 14]
185
+ - sources:
186
+ - model: Qwen/Qwen2.5-32B-Instruct
187
+ layer_range: [13, 14]
188
+ - sources:
189
+ - model: Qwen/Qwen2.5-32B-Instruct
190
+ layer_range: [14, 16]
191
+
192
+ # Next 8 layers: Replicate 4 layers
193
+ - sources:
194
+ - model: Qwen/Qwen2.5-32B-Instruct
195
+ layer_range: [16, 18]
196
+ - sources:
197
+ - model: Qwen/Qwen2.5-32B-Instruct
198
+ layer_range: [17, 19]
199
+ - sources:
200
+ - model: Qwen/Qwen2.5-32B-Instruct
201
+ layer_range: [18, 20]
202
+ - sources:
203
+ - model: Qwen/Qwen2.5-32B-Instruct
204
+ layer_range: [19, 21]
205
+ - sources:
206
+ - model: Qwen/Qwen2.5-32B-Instruct
207
+ layer_range: [20, 22]
208
+ - sources:
209
+ - model: Qwen/Qwen2.5-32B-Instruct
210
+ layer_range: [21, 23]
211
+ - sources:
212
+ - model: Qwen/Qwen2.5-32B-Instruct
213
+ layer_range: [22, 24]
214
+
215
+ # Next 8 layers: Replicate all 8 layers
216
+ - sources:
217
+ - model: Qwen/Qwen2.5-32B-Instruct
218
+ layer_range: [24, 25]
219
+ - sources:
220
+ - model: Qwen/Qwen2.5-32B-Instruct
221
+ layer_range: [24, 26]
222
+ - sources:
223
+ - model: Qwen/Qwen2.5-32B-Instruct
224
+ layer_range: [25, 27]
225
+ - sources:
226
+ - model: Qwen/Qwen2.5-32B-Instruct
227
+ layer_range: [26, 28]
228
+ - sources:
229
+ - model: Qwen/Qwen2.5-32B-Instruct
230
+ layer_range: [27, 29]
231
+ - sources:
232
+ - model: Qwen/Qwen2.5-32B-Instruct
233
+ layer_range: [28, 30]
234
+ - sources:
235
+ - model: Qwen/Qwen2.5-32B-Instruct
236
+ layer_range: [29, 31]
237
+ - sources:
238
+ - model: Qwen/Qwen2.5-32B-Instruct
239
+ layer_range: [30, 32]
240
+
241
+ # Middle 8 layers: Replicate all 8 layers
242
+ - sources:
243
+ - model: Qwen/Qwen2.5-32B-Instruct
244
+ layer_range: [32, 33]
245
+ - sources:
246
+ - model: Qwen/Qwen2.5-32B-Instruct
247
+ layer_range: [32, 34]
248
+ - sources:
249
+ - model: Qwen/Qwen2.5-32B-Instruct
250
+ layer_range: [33, 35]
251
+ - sources:
252
+ - model: Qwen/Qwen2.5-32B-Instruct
253
+ layer_range: [34, 36]
254
+ - sources:
255
+ - model: Qwen/Qwen2.5-32B-Instruct
256
+ layer_range: [35, 37]
257
+ - sources:
258
+ - model: Qwen/Qwen2.5-32B-Instruct
259
+ layer_range: [36, 38]
260
+ - sources:
261
+ - model: Qwen/Qwen2.5-32B-Instruct
262
+ layer_range: [37, 39]
263
+ - sources:
264
+ - model: Qwen/Qwen2.5-32B-Instruct
265
+ layer_range: [38, 40]
266
+
267
+ # Next 8 layers: Replicate 4 layers
268
+ - sources:
269
+ - model: Qwen/Qwen2.5-32B-Instruct
270
+ layer_range: [40, 42]
271
+ - sources:
272
+ - model: Qwen/Qwen2.5-32B-Instruct
273
+ layer_range: [41, 43]
274
+ - sources:
275
+ - model: Qwen/Qwen2.5-32B-Instruct
276
+ layer_range: [42, 44]
277
+ - sources:
278
+ - model: Qwen/Qwen2.5-32B-Instruct
279
+ layer_range: [43, 45]
280
+ - sources:
281
+ - model: Qwen/Qwen2.5-32B-Instruct
282
+ layer_range: [44, 46]
283
+ - sources:
284
+ - model: Qwen/Qwen2.5-32B-Instruct
285
+ layer_range: [45, 47]
286
+ - sources:
287
+ - model: Qwen/Qwen2.5-32B-Instruct
288
+ layer_range: [46, 48]
289
+
290
+ # Next 8 layers: Replicate 2 layers
291
+ - sources:
292
+ - model: Qwen/Qwen2.5-32B-Instruct
293
+ layer_range: [48, 49]
294
+ - sources:
295
+ - model: Qwen/Qwen2.5-32B-Instruct
296
+ layer_range: [48, 49]
297
+ - sources:
298
+ - model: Qwen/Qwen2.5-32B-Instruct
299
+ layer_range: [49, 53]
300
+ - sources:
301
+ - model: Qwen/Qwen2.5-32B-Instruct
302
+ layer_range: [53, 54]
303
+ - sources:
304
+ - model: Qwen/Qwen2.5-32B-Instruct
305
+ layer_range: [53, 54]
306
+ - sources:
307
+ - model: Qwen/Qwen2.5-32B-Instruct
308
+ layer_range: [54, 56]
309
+
310
+ # Last 8 layers: No replication
311
+ - sources:
312
+ - model: Qwen/Qwen2.5-32B-Instruct
313
+ layer_range: [56, 64]
314
+
315
+ merge_method: passthrough
316
+ dtype: bfloat16
317
+ ```
318
+
319
+ ## 💻 Usage
320
+
321
+ ```python
322
+ !pip install -qU transformers accelerate
323
+
324
+ from transformers import AutoTokenizer
325
+ import transformers
326
+ import torch
327
+
328
+ model = "mlabonne/BigQwen2.5-Echo-47B-Instruct"
329
+ messages = [{"role": "user", "content": "What is a large language model?"}]
330
+
331
+ tokenizer = AutoTokenizer.from_pretrained(model)
332
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
333
+ pipeline = transformers.pipeline(
334
+ "text-generation",
335
+ model=model,
336
+ torch_dtype=torch.float16,
337
+ device_map="auto",
338
+ )
339
+
340
+ outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
341
+ print(outputs[0]["generated_text"])
342
  ```