MikeRoz commited on
Commit
a53bf1b
·
verified ·
1 Parent(s): 3140c6f

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.model.v3 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,18 +1,378 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
2
  license: other
3
  license_name: mrl
4
  inference: false
5
  license_link: https://mistral.ai/licenses/MRL-0.1.md
6
- base_model: mistralai/Mistral-Large-Instruct-2407
7
- base_model_relation: quantized
8
- tags:
9
- - exl3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
- [exllamav3](https://github.com/turboderp-org/exllamav3) quantizations of [Mistral-Large-Instruct-2407](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407)
12
 
13
- Will update this space with links to the quant branches as they finish uploading. Expect the same sizes as in [turboderp's exl3 quant of Mistral-Large-Instruct-2411](https://huggingface.co/turboderp/Mistral-Large-Instruct-2411-exl3).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- [1.40 bpw/H4](https://huggingface.co/MikeRoz/Mistral-Large-Instruct-2407-exl3/tree/1.4bpw_H4)
16
- [2.25 bpw/H5](https://huggingface.co/MikeRoz/Mistral-Large-Instruct-2407-exl3/tree/2.25bpw_H5)
17
- [2.50 bpw/H5](https://huggingface.co/MikeRoz/Mistral-Large-Instruct-2407-exl3/tree/2.50bpw_H5)
18
- [3.00 bpw/H6](https://huggingface.co/MikeRoz/Mistral-Large-Instruct-2407-exl3/tree/3.0bpw_H6)
 
1
  ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - de
6
+ - es
7
+ - it
8
+ - pt
9
+ - zh
10
+ - ja
11
+ - ru
12
+ - ko
13
  license: other
14
  license_name: mrl
15
  inference: false
16
  license_link: https://mistral.ai/licenses/MRL-0.1.md
17
+ extra_gated_prompt: >-
18
+ # Mistral AI Research License
19
+
20
+ If You want to use a Mistral Model, a Derivative or an Output for any purpose that is not expressly authorized under this Agreement, You must request a license from Mistral AI, which Mistral AI may grant to You in Mistral AI's sole discretion. To discuss such a license, please contact Mistral AI via the website contact form: https://mistral.ai/contact/
21
+
22
+ ## 1. Scope and acceptance
23
+
24
+ **1.1. Scope of the Agreement.** This Agreement applies to any use, modification, or Distribution of any Mistral Model by You, regardless of the source You obtained a copy of such Mistral Model.
25
+
26
+ **1.2. Acceptance.** By accessing, using, modifying, Distributing a Mistral Model, or by creating, using or distributing a Derivative of the Mistral Model, You agree to be bound by this Agreement.
27
+
28
+ **1.3. Acceptance on behalf of a third-party.** If You accept this Agreement on behalf of Your employer or another person or entity, You warrant and represent that You have the authority to act and accept this Agreement on their behalf. In such a case, the word "You" in this Agreement will refer to Your employer or such other person or entity.
29
+
30
+ ## 2. License
31
+
32
+ **2.1. Grant of rights**. Subject to Section 3 below, Mistral AI hereby grants You a non-exclusive, royalty-free, worldwide, non-sublicensable, non-transferable, limited license to use, copy, modify, and Distribute under the conditions provided in Section 2.2 below, the Mistral Model and any Derivatives made by or for Mistral AI and to create Derivatives of the Mistral Model.
33
+
34
+ **2.2. Distribution of Mistral Model and Derivatives made by or for Mistral AI.** Subject to Section 3 below, You may Distribute copies of the Mistral Model and/or Derivatives made by or for Mistral AI, under the following conditions:
35
+ You must make available a copy of this Agreement to third-party recipients of the Mistral Models and/or Derivatives made by or for Mistral AI you Distribute, it being specified that any rights to use the Mistral Models and/or Derivatives made by or for Mistral AI shall be directly granted by Mistral AI to said third-party recipients pursuant to the Mistral AI Research License agreement executed between these parties;
36
+ You must retain in all copies of the Mistral Models the following attribution notice within a "Notice" text file distributed as part of such copies: "Licensed by Mistral AI under the Mistral AI Research License".
37
+
38
+ **2.3. Distribution of Derivatives made by or for You.** Subject to Section 3 below, You may Distribute any Derivatives made by or for You under additional or different terms and conditions, provided that:
39
+ In any event, the use and modification of Mistral Model and/or Derivatives made by or for Mistral AI shall remain governed by the terms and conditions of this Agreement;
40
+ You include in any such Derivatives made by or for You prominent notices stating that You modified the concerned Mistral Model; and
41
+ Any terms and conditions You impose on any third-party recipients relating to Derivatives made by or for You shall neither limit such third-party recipients' use of the Mistral Model or any Derivatives made by or for Mistral AI in accordance with the Mistral AI Research License nor conflict with any of its terms and conditions.
42
+
43
+ ## 3. Limitations
44
+
45
+ **3.1. Misrepresentation.** You must not misrepresent or imply, through any means, that the Derivatives made by or for You and/or any modified version of the Mistral Model You Distribute under your name and responsibility is an official product of Mistral AI or has been endorsed, approved or validated by Mistral AI, unless You are authorized by Us to do so in writing.
46
+
47
+ **3.2. Usage Limitation.** You shall only use the Mistral Models, Derivatives (whether or not created by Mistral AI) and Outputs for Research Purposes.
48
+
49
+ ## 4. Intellectual Property
50
+
51
+ **4.1. Trademarks.** No trademark licenses are granted under this Agreement, and in connection with the Mistral Models, You may not use any name or mark owned by or associated with Mistral AI or any of its affiliates, except (i) as required for reasonable and customary use in describing and Distributing the Mistral Models and Derivatives made by or for Mistral AI and (ii) for attribution purposes as required by this Agreement.
52
+
53
+ **4.2. Outputs.** We claim no ownership rights in and to the Outputs. You are solely responsible for the Outputs You generate and their subsequent uses in accordance with this Agreement. Any Outputs shall be subject to the restrictions set out in Section 3 of this Agreement.
54
+
55
+ **4.3. Derivatives.** By entering into this Agreement, You accept that any Derivatives that You may create or that may be created for You shall be subject to the restrictions set out in Section 3 of this Agreement.
56
+
57
+ ## 5. Liability
58
+
59
+ **5.1. Limitation of liability.** In no event, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall Mistral AI be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this Agreement or out of the use or inability to use the Mistral Models and Derivatives (including but not limited to damages for loss of data, loss of goodwill, loss of expected profit or savings, work stoppage, computer failure or malfunction, or any damage caused by malware or security breaches), even if Mistral AI has been advised of the possibility of such damages.
60
+
61
+ **5.2. Indemnification.** You agree to indemnify and hold harmless Mistral AI from and against any claims, damages, or losses arising out of or related to Your use or Distribution of the Mistral Models and Derivatives.
62
+
63
+ ## 6. Warranty
64
+
65
+ **6.1. Disclaimer.** Unless required by applicable law or prior agreed to by Mistral AI in writing, Mistral AI provides the Mistral Models and Derivatives on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. Mistral AI does not represent nor warrant that the Mistral Models and Derivatives will be error-free, meet Your or any third party's requirements, be secure or will allow You or any third party to achieve any kind of result or generate any kind of content. You are solely responsible for determining the appropriateness of using or Distributing the Mistral Models and Derivatives and assume any risks associated with Your exercise of rights under this Agreement.
66
+
67
+ ## 7. Termination
68
+
69
+ **7.1. Term.** This Agreement is effective as of the date of your acceptance of this Agreement or access to the concerned Mistral Models or Derivatives and will continue until terminated in accordance with the following terms.
70
+
71
+ **7.2. Termination.** Mistral AI may terminate this Agreement at any time if You are in breach of this Agreement. Upon termination of this Agreement, You must cease to use all Mistral Models and Derivatives and shall permanently delete any copy thereof. The following provisions, in their relevant parts, will survive any termination or expiration of this Agreement, each for the duration necessary to achieve its own intended purpose (e.g. the liability provision will survive until the end of the applicable limitation period):Sections 5 (Liability), 6(Warranty), 7 (Termination) and 8 (General Provisions).
72
+
73
+ **7.3. Litigation.** If You initiate any legal action or proceedings against Us or any other entity (including a cross-claim or counterclaim in a lawsuit), alleging that the Model or a Derivative, or any part thereof, infringe upon intellectual property or other rights owned or licensable by You, then any licenses granted to You under this Agreement will immediately terminate as of the date such legal action or claim is filed or initiated.
74
+
75
+ ## 8. General provisions
76
+
77
+ **8.1. Governing laws.** This Agreement will be governed by the laws of France, without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
78
+
79
+ **8.2. Competent jurisdiction.** The courts of Paris shall have exclusive jurisdiction of any dispute arising out of this Agreement.
80
+
81
+ **8.3. Severability.** If any provision of this Agreement is held to be invalid, illegal or unenforceable, the remaining provisions shall be unaffected thereby and remain valid as if such provision had not been set forth herein.
82
+
83
+ ## 9. Definitions
84
+
85
+ "Agreement": means this Mistral AI Research License agreement governing the access, use, and Distribution of the Mistral Models, Derivatives and Outputs.
86
+
87
+ "Derivative": means any (i) modified version of the Mistral Model (including but not limited to any customized or fine-tuned version thereof), (ii) work based on the Mistral Model, or (iii) any other derivative work thereof.
88
+
89
+ "Distribution", "Distributing", "Distribute" or "Distributed": means supplying, providing or making available, by any means, a copy of the Mistral Models and/or the Derivatives as the case may be, subject to Section 3 of this Agreement.
90
+
91
+ "Mistral AI", "We" or "Us": means Mistral AI, a French société par actions simplifiée registered in the Paris commercial registry under the number 952 418 325, and having its registered seat at 15, rue des Halles, 75001 Paris.
92
+
93
+ "Mistral Model": means the foundational large language model(s), and its elements which include algorithms, software, instructed checkpoints, parameters, source code (inference code, evaluation code and, if applicable, fine-tuning code) and any other elements associated thereto made available by Mistral AI under this Agreement, including, if any, the technical documentation, manuals and instructions for the use and operation thereof.
94
+
95
+ "Research Purposes": means any use of a Mistral Model, Derivative, or Output that is solely for (a) personal, scientific or academic research, and (b) for non-profit and non-commercial purposes, and not directly or indirectly connected to any commercial activities or business operations. For illustration purposes, Research Purposes does not include (1) any usage of the Mistral Model, Derivative or Output by individuals or contractors employed in or engaged by companies in the context of (a) their daily tasks, or (b) any activity (including but not limited to any testing or proof-of-concept) that is intended to generate revenue, nor (2) any Distribution by a commercial entity of the Mistral Model, Derivative or Output whether in return for payment or free of charge, in any medium or form, including but not limited to through a hosted or managed service (e.g. SaaS, cloud instances, etc.), or behind a software layer.
96
+
97
+ "Outputs": means any content generated by the operation of the Mistral Models or the Derivatives from a prompt (i.e., text instructions) provided by users. For the avoidance of doubt, Outputs do not include any components of a Mistral Models, such as any fine-tuned versions of the Mistral Models, the weights, or parameters.
98
+
99
+ "You": means the individual or entity entering into this Agreement with Mistral AI.
100
+
101
+
102
+ *Mistral AI processes your personal data below to provide the model and enforce its license. If you are affiliated with a commercial entity, we may also send you communications about our models. For more information on your rights and data handling, please see our <a href="https://mistral.ai/terms/">privacy policy</a>.*
103
+ extra_gated_fields:
104
+ First Name: text
105
+ Last Name: text
106
+ Country: country
107
+ Affiliation: text
108
+ Job title: text
109
+ I understand that I can only use the model, any derivative versions and their outputs for non-commercial research purposes: checkbox
110
+ I understand that if I am a commercial entity, I am not permitted to use or distribute the model internally or externally, or expose it in my own offerings without a commercial license: checkbox
111
+ I understand that if I upload the model, or any derivative version, on any platform, I must include the Mistral Research License: checkbox
112
+ I understand that for commercial use of the model, I can contact Mistral or use the Mistral AI API on la Plateforme or any of our cloud provider partners: checkbox
113
+ ? By clicking Submit below I accept the terms of the license and acknowledge that
114
+ the information I provide will be collected stored processed and shared in accordance
115
+ with the Mistral Privacy Policy
116
+ : checkbox
117
+ geo: ip_location
118
+ extra_gated_description: >-
119
+ Mistral AI processes your personal data below to provide the model and enforce its license. If you are affiliated with a commercial entity, we may also send you communications about our models. For more information on your rights and data handling, please see our <a href="https://mistral.ai/terms/">privacy policy</a>.
120
+ extra_gated_button_content: Submit
121
+ library_name: vllm
122
  ---
 
123
 
124
+ # Model Card for Mistral-Large-Instruct-2407
125
+
126
+ Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities.
127
+
128
+ For more details about this model please refer to our release [blog post](https://mistral.ai/news/mistral-large-2407/).
129
+
130
+ ## Key features
131
+ - **Multi-lingual by design:** Dozens of languages supported, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch and Polish.
132
+ - **Proficient in coding:** Trained on 80+ coding languages such as Python, Java, C, C++, Javacsript, and Bash. Also trained on more specific languages such as Swift and Fortran.
133
+ - **Agentic-centric:** Best-in-class agentic capabilities with native function calling and JSON outputting.
134
+ - **Advanced Reasoning:** State-of-the-art mathematical and reasoning capabilities.
135
+ - **Mistral Research License:** Allows usage and modification for research and non-commercial usages.
136
+ - **Large Context:** A large 128k context window.
137
+
138
+ ## Metrics
139
+
140
+ ### Base Pretrained Benchmarks
141
+
142
+ | Benchmark | Score |
143
+ | --- | --- |
144
+ | MMLU | 84.0% |
145
+
146
+
147
+ ### Base Pretrained Multilingual Benchmarks (MMLU)
148
+ | Benchmark | Score |
149
+ | --- | --- |
150
+ | French | 82.8% |
151
+ | German | 81.6% |
152
+ | Spanish | 82.7% |
153
+ | Italian | 82.7% |
154
+ | Dutch | 80.7% |
155
+ | Portuguese | 81.6% |
156
+ | Russian | 79.0% |
157
+ | Korean | 60.1% |
158
+ | Japanese | 78.8% |
159
+ | Chinese | 74.8% |
160
+
161
+
162
+ ### Instruction Benchmarks
163
+
164
+ | Benchmark | Score |
165
+ | --- | --- |
166
+ | MT Bench | 8.63 |
167
+ | Wild Bench | 56.3 |
168
+ | Arena Hard| 73.2 |
169
+
170
+ ### Code & Reasoning Benchmarks
171
+ | Benchmark | Score |
172
+ | --- | --- |
173
+ | Human Eval | 92% |
174
+ | Human Eval Plus| 87% |
175
+ | MBPP Base| 80% |
176
+ | MBPP Plus| 69% |
177
+
178
+ ### Math Benchmarks
179
+
180
+ | Benchmark | Score |
181
+ | --- | --- |
182
+ | GSM8K | 93% |
183
+ | Math Instruct (0-shot, no CoT) | 70% |
184
+ | Math Instruct (0-shot, CoT)| 71.5% |
185
+
186
+ ## Usage
187
+
188
+ The model can be used with two different frameworks
189
+
190
+ - [`mistral_inference`](https://github.com/mistralai/mistral-inference): See [here](#mistral-inference)
191
+ - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
192
+
193
+ ### Mistral Inference
194
+
195
+ #### Install
196
+
197
+ It is recommended to use `mistralai/Mistral-Large-Instruct-2407` with [mistral-inference](https://github.com/mistralai/mistral-inference). For HF transformers code snippets, please keep scrolling.
198
+
199
+ ```
200
+ pip install mistral_inference
201
+ ```
202
+
203
+ #### Download
204
+
205
+ ```py
206
+ from huggingface_hub import snapshot_download
207
+ from pathlib import Path
208
+
209
+ mistral_models_path = Path.home().joinpath('mistral_models', 'Large')
210
+ mistral_models_path.mkdir(parents=True, exist_ok=True)
211
+
212
+ snapshot_download(repo_id="mistralai/Mistral-Large-Instruct-2407", allow_patterns=["params.json", "consolidated-*.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)
213
+ ```
214
+
215
+ #### Chat
216
+
217
+ After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment.
218
+ Given the size of this model, you will need a node with several GPUs (more than 300GB cumulated vRAM).
219
+ If you have 8 GPUs on your machine, you can chat with the model using
220
+
221
+ ```
222
+ torchrun --nproc-per-node 8 --no-python mistral-chat $HOME/mistral_models/Large --instruct --max_tokens 256 --temperature 0.7
223
+ ```
224
+
225
+ *E.g.* Try out something like:
226
+ ```
227
+ How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar.
228
+ ```
229
+
230
+ #### Instruct following
231
+
232
+ ```py
233
+ from mistral_inference.transformer import Transformer
234
+ from mistral_inference.generate import generate
235
+
236
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
237
+ from mistral_common.protocol.instruct.messages import UserMessage
238
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
239
+
240
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
241
+ model = Transformer.from_folder(mistral_models_path)
242
+
243
+ prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."
244
+
245
+ completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])
246
+
247
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
248
+
249
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.7, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
250
+ result = tokenizer.decode(out_tokens[0])
251
+
252
+ print(result)
253
+ ```
254
+
255
+ #### Function calling
256
+
257
+ ```py
258
+ from mistral_common.protocol.instruct.tool_calls import Function, Tool
259
+ from mistral_inference.transformer import Transformer
260
+ from mistral_inference.generate import generate
261
+
262
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
263
+ from mistral_common.protocol.instruct.messages import UserMessage
264
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
265
+
266
+
267
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
268
+ model = Transformer.from_folder(mistral_models_path)
269
+
270
+ completion_request = ChatCompletionRequest(
271
+ tools=[
272
+ Tool(
273
+ function=Function(
274
+ name="get_current_weather",
275
+ description="Get the current weather",
276
+ parameters={
277
+ "type": "object",
278
+ "properties": {
279
+ "location": {
280
+ "type": "string",
281
+ "description": "The city and state, e.g. San Francisco, CA",
282
+ },
283
+ "format": {
284
+ "type": "string",
285
+ "enum": ["celsius", "fahrenheit"],
286
+ "description": "The temperature unit to use. Infer this from the users location.",
287
+ },
288
+ },
289
+ "required": ["location", "format"],
290
+ },
291
+ )
292
+ )
293
+ ],
294
+ messages=[
295
+ UserMessage(content="What's the weather like today in Paris?"),
296
+ ],
297
+ )
298
+
299
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
300
+
301
+ out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.7, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
302
+ result = tokenizer.decode(out_tokens[0])
303
+
304
+ print(result)
305
+ ```
306
+
307
+ ### Transformers
308
+
309
+ If you want to use Hugging Face `transformers` to generate text, you can do something like this.
310
+
311
+ ```py
312
+ from transformers import pipeline
313
+
314
+ messages = [
315
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
316
+ {"role": "user", "content": "Who are you?"},
317
+ ]
318
+ chatbot = pipeline("text-generation", model="mistralai/Mistral-Large-Instruct-2407")
319
+ chatbot(messages)
320
+ ```
321
+
322
+ ## Function calling with `transformers`
323
+
324
+ To use this example, you'll need `transformers` version 4.42.0 or higher. Please see the
325
+ [function calling guide](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling)
326
+ in the `transformers` docs for more information.
327
+
328
+ ```python
329
+ from transformers import AutoModelForCausalLM, AutoTokenizer
330
+ import torch
331
+
332
+ model_id = "mistralai/Mistral-Large-Instruct-2407"
333
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
334
+
335
+ def get_current_weather(location: str, format: str):
336
+ """
337
+ Get the current weather
338
+
339
+ Args:
340
+ location: The city and state, e.g. San Francisco, CA
341
+ format: The temperature unit to use. Infer this from the users location. (choices: ["celsius", "fahrenheit"])
342
+ """
343
+ pass
344
+
345
+ conversation = [{"role": "user", "content": "What's the weather like in Paris?"}]
346
+ tools = [get_current_weather]
347
+
348
+ # format and tokenize the tool use prompt
349
+ inputs = tokenizer.apply_chat_template(
350
+ conversation,
351
+ tools=tools,
352
+ add_generation_prompt=True,
353
+ return_dict=True,
354
+ return_tensors="pt",
355
+ )
356
+
357
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
358
+
359
+ inputs.to(model.device)
360
+ outputs = model.generate(**inputs, max_new_tokens=1000)
361
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
362
+ ```
363
+
364
+ Note that, for reasons of space, this example does not show a complete cycle of calling a tool and adding the tool call and tool
365
+ results to the chat history so that the model can use them in its next generation. For a full tool calling example, please
366
+ see the [function calling guide](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling),
367
+ and note that Mistral **does** use tool call IDs, so these must be included in your tool calls and tool results. They should be
368
+ exactly 9 alphanumeric characters.
369
+
370
+ ## Limitations
371
+
372
+ The Mistral Large model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance.
373
+ It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
374
+ make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
375
+
376
+ ## The Mistral AI Team
377
 
378
+ Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall
 
 
 
config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 12288,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 28672,
12
+ "max_position_embeddings": 131072,
13
+ "model_type": "mistral",
14
+ "num_attention_heads": 96,
15
+ "num_hidden_layers": 88,
16
+ "num_key_value_heads": 8,
17
+ "rms_norm_eps": 1e-05,
18
+ "rope_theta": 1000000.0,
19
+ "sliding_window": null,
20
+ "tie_word_embeddings": false,
21
+ "torch_dtype": "bfloat16",
22
+ "transformers_version": "4.42.3",
23
+ "use_cache": true,
24
+ "vocab_size": 32768,
25
+ "quantization_config": {
26
+ "quant_method": "exl3",
27
+ "version": "0.0.1",
28
+ "bits": 2.0,
29
+ "calibration": {
30
+ "rows": 100,
31
+ "cols": 2048
32
+ }
33
+ }
34
+ }
consolidated.safetensors.index.json ADDED
@@ -0,0 +1,802 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 245220139008
4
+ },
5
+ "weight_map": {
6
+ "layers.0.attention.wk.weight": "consolidated-00001-of-00051.safetensors",
7
+ "layers.0.attention.wo.weight": "consolidated-00001-of-00051.safetensors",
8
+ "layers.0.attention.wq.weight": "consolidated-00001-of-00051.safetensors",
9
+ "layers.0.attention.wv.weight": "consolidated-00001-of-00051.safetensors",
10
+ "layers.0.attention_norm.weight": "consolidated-00001-of-00051.safetensors",
11
+ "layers.0.feed_forward.w1.weight": "consolidated-00001-of-00051.safetensors",
12
+ "layers.0.feed_forward.w2.weight": "consolidated-00001-of-00051.safetensors",
13
+ "layers.0.feed_forward.w3.weight": "consolidated-00001-of-00051.safetensors",
14
+ "layers.0.ffn_norm.weight": "consolidated-00001-of-00051.safetensors",
15
+ "layers.1.attention.wk.weight": "consolidated-00001-of-00051.safetensors",
16
+ "layers.1.attention.wo.weight": "consolidated-00001-of-00051.safetensors",
17
+ "layers.1.attention.wq.weight": "consolidated-00001-of-00051.safetensors",
18
+ "layers.1.attention.wv.weight": "consolidated-00001-of-00051.safetensors",
19
+ "layers.1.attention_norm.weight": "consolidated-00001-of-00051.safetensors",
20
+ "layers.1.feed_forward.w1.weight": "consolidated-00001-of-00051.safetensors",
21
+ "layers.1.feed_forward.w2.weight": "consolidated-00001-of-00051.safetensors",
22
+ "layers.1.feed_forward.w3.weight": "consolidated-00002-of-00051.safetensors",
23
+ "layers.1.ffn_norm.weight": "consolidated-00002-of-00051.safetensors",
24
+ "layers.10.attention.wk.weight": "consolidated-00002-of-00051.safetensors",
25
+ "layers.10.attention.wo.weight": "consolidated-00002-of-00051.safetensors",
26
+ "layers.10.attention.wq.weight": "consolidated-00002-of-00051.safetensors",
27
+ "layers.10.attention.wv.weight": "consolidated-00002-of-00051.safetensors",
28
+ "layers.10.attention_norm.weight": "consolidated-00002-of-00051.safetensors",
29
+ "layers.10.feed_forward.w1.weight": "consolidated-00002-of-00051.safetensors",
30
+ "layers.10.feed_forward.w2.weight": "consolidated-00002-of-00051.safetensors",
31
+ "layers.10.feed_forward.w3.weight": "consolidated-00002-of-00051.safetensors",
32
+ "layers.10.ffn_norm.weight": "consolidated-00002-of-00051.safetensors",
33
+ "layers.11.attention.wk.weight": "consolidated-00002-of-00051.safetensors",
34
+ "layers.11.attention.wo.weight": "consolidated-00002-of-00051.safetensors",
35
+ "layers.11.attention.wq.weight": "consolidated-00002-of-00051.safetensors",
36
+ "layers.11.attention.wv.weight": "consolidated-00002-of-00051.safetensors",
37
+ "layers.11.attention_norm.weight": "consolidated-00002-of-00051.safetensors",
38
+ "layers.11.feed_forward.w1.weight": "consolidated-00002-of-00051.safetensors",
39
+ "layers.11.feed_forward.w2.weight": "consolidated-00003-of-00051.safetensors",
40
+ "layers.11.feed_forward.w3.weight": "consolidated-00003-of-00051.safetensors",
41
+ "layers.11.ffn_norm.weight": "consolidated-00003-of-00051.safetensors",
42
+ "layers.12.attention.wk.weight": "consolidated-00003-of-00051.safetensors",
43
+ "layers.12.attention.wo.weight": "consolidated-00003-of-00051.safetensors",
44
+ "layers.12.attention.wq.weight": "consolidated-00003-of-00051.safetensors",
45
+ "layers.12.attention.wv.weight": "consolidated-00003-of-00051.safetensors",
46
+ "layers.12.attention_norm.weight": "consolidated-00003-of-00051.safetensors",
47
+ "layers.12.feed_forward.w1.weight": "consolidated-00003-of-00051.safetensors",
48
+ "layers.12.feed_forward.w2.weight": "consolidated-00003-of-00051.safetensors",
49
+ "layers.12.feed_forward.w3.weight": "consolidated-00003-of-00051.safetensors",
50
+ "layers.12.ffn_norm.weight": "consolidated-00003-of-00051.safetensors",
51
+ "layers.13.attention.wk.weight": "consolidated-00003-of-00051.safetensors",
52
+ "layers.13.attention.wo.weight": "consolidated-00003-of-00051.safetensors",
53
+ "layers.13.attention.wq.weight": "consolidated-00003-of-00051.safetensors",
54
+ "layers.13.attention.wv.weight": "consolidated-00003-of-00051.safetensors",
55
+ "layers.13.attention_norm.weight": "consolidated-00003-of-00051.safetensors",
56
+ "layers.13.feed_forward.w1.weight": "consolidated-00004-of-00051.safetensors",
57
+ "layers.13.feed_forward.w2.weight": "consolidated-00004-of-00051.safetensors",
58
+ "layers.13.feed_forward.w3.weight": "consolidated-00004-of-00051.safetensors",
59
+ "layers.13.ffn_norm.weight": "consolidated-00004-of-00051.safetensors",
60
+ "layers.14.attention.wk.weight": "consolidated-00004-of-00051.safetensors",
61
+ "layers.14.attention.wo.weight": "consolidated-00004-of-00051.safetensors",
62
+ "layers.14.attention.wq.weight": "consolidated-00004-of-00051.safetensors",
63
+ "layers.14.attention.wv.weight": "consolidated-00004-of-00051.safetensors",
64
+ "layers.14.attention_norm.weight": "consolidated-00004-of-00051.safetensors",
65
+ "layers.14.feed_forward.w1.weight": "consolidated-00004-of-00051.safetensors",
66
+ "layers.14.feed_forward.w2.weight": "consolidated-00004-of-00051.safetensors",
67
+ "layers.14.feed_forward.w3.weight": "consolidated-00004-of-00051.safetensors",
68
+ "layers.14.ffn_norm.weight": "consolidated-00004-of-00051.safetensors",
69
+ "layers.15.attention.wk.weight": "consolidated-00004-of-00051.safetensors",
70
+ "layers.15.attention.wo.weight": "consolidated-00005-of-00051.safetensors",
71
+ "layers.15.attention.wq.weight": "consolidated-00005-of-00051.safetensors",
72
+ "layers.15.attention.wv.weight": "consolidated-00005-of-00051.safetensors",
73
+ "layers.15.attention_norm.weight": "consolidated-00005-of-00051.safetensors",
74
+ "layers.15.feed_forward.w1.weight": "consolidated-00005-of-00051.safetensors",
75
+ "layers.15.feed_forward.w2.weight": "consolidated-00005-of-00051.safetensors",
76
+ "layers.15.feed_forward.w3.weight": "consolidated-00005-of-00051.safetensors",
77
+ "layers.15.ffn_norm.weight": "consolidated-00005-of-00051.safetensors",
78
+ "layers.16.attention.wk.weight": "consolidated-00005-of-00051.safetensors",
79
+ "layers.16.attention.wo.weight": "consolidated-00005-of-00051.safetensors",
80
+ "layers.16.attention.wq.weight": "consolidated-00005-of-00051.safetensors",
81
+ "layers.16.attention.wv.weight": "consolidated-00005-of-00051.safetensors",
82
+ "layers.16.attention_norm.weight": "consolidated-00005-of-00051.safetensors",
83
+ "layers.16.feed_forward.w1.weight": "consolidated-00005-of-00051.safetensors",
84
+ "layers.16.feed_forward.w2.weight": "consolidated-00005-of-00051.safetensors",
85
+ "layers.16.feed_forward.w3.weight": "consolidated-00006-of-00051.safetensors",
86
+ "layers.16.ffn_norm.weight": "consolidated-00006-of-00051.safetensors",
87
+ "layers.17.attention.wk.weight": "consolidated-00006-of-00051.safetensors",
88
+ "layers.17.attention.wo.weight": "consolidated-00006-of-00051.safetensors",
89
+ "layers.17.attention.wq.weight": "consolidated-00006-of-00051.safetensors",
90
+ "layers.17.attention.wv.weight": "consolidated-00006-of-00051.safetensors",
91
+ "layers.17.attention_norm.weight": "consolidated-00006-of-00051.safetensors",
92
+ "layers.17.feed_forward.w1.weight": "consolidated-00006-of-00051.safetensors",
93
+ "layers.17.feed_forward.w2.weight": "consolidated-00006-of-00051.safetensors",
94
+ "layers.17.feed_forward.w3.weight": "consolidated-00006-of-00051.safetensors",
95
+ "layers.17.ffn_norm.weight": "consolidated-00006-of-00051.safetensors",
96
+ "layers.18.attention.wk.weight": "consolidated-00006-of-00051.safetensors",
97
+ "layers.18.attention.wo.weight": "consolidated-00006-of-00051.safetensors",
98
+ "layers.18.attention.wq.weight": "consolidated-00006-of-00051.safetensors",
99
+ "layers.18.attention.wv.weight": "consolidated-00006-of-00051.safetensors",
100
+ "layers.18.attention_norm.weight": "consolidated-00006-of-00051.safetensors",
101
+ "layers.18.feed_forward.w1.weight": "consolidated-00006-of-00051.safetensors",
102
+ "layers.18.feed_forward.w2.weight": "consolidated-00007-of-00051.safetensors",
103
+ "layers.18.feed_forward.w3.weight": "consolidated-00007-of-00051.safetensors",
104
+ "layers.18.ffn_norm.weight": "consolidated-00007-of-00051.safetensors",
105
+ "layers.19.attention.wk.weight": "consolidated-00007-of-00051.safetensors",
106
+ "layers.19.attention.wo.weight": "consolidated-00007-of-00051.safetensors",
107
+ "layers.19.attention.wq.weight": "consolidated-00007-of-00051.safetensors",
108
+ "layers.19.attention.wv.weight": "consolidated-00007-of-00051.safetensors",
109
+ "layers.19.attention_norm.weight": "consolidated-00007-of-00051.safetensors",
110
+ "layers.19.feed_forward.w1.weight": "consolidated-00007-of-00051.safetensors",
111
+ "layers.19.feed_forward.w2.weight": "consolidated-00007-of-00051.safetensors",
112
+ "layers.19.feed_forward.w3.weight": "consolidated-00007-of-00051.safetensors",
113
+ "layers.19.ffn_norm.weight": "consolidated-00007-of-00051.safetensors",
114
+ "layers.2.attention.wk.weight": "consolidated-00007-of-00051.safetensors",
115
+ "layers.2.attention.wo.weight": "consolidated-00007-of-00051.safetensors",
116
+ "layers.2.attention.wq.weight": "consolidated-00007-of-00051.safetensors",
117
+ "layers.2.attention.wv.weight": "consolidated-00007-of-00051.safetensors",
118
+ "layers.2.attention_norm.weight": "consolidated-00007-of-00051.safetensors",
119
+ "layers.2.feed_forward.w1.weight": "consolidated-00008-of-00051.safetensors",
120
+ "layers.2.feed_forward.w2.weight": "consolidated-00008-of-00051.safetensors",
121
+ "layers.2.feed_forward.w3.weight": "consolidated-00008-of-00051.safetensors",
122
+ "layers.2.ffn_norm.weight": "consolidated-00008-of-00051.safetensors",
123
+ "layers.20.attention.wk.weight": "consolidated-00008-of-00051.safetensors",
124
+ "layers.20.attention.wo.weight": "consolidated-00008-of-00051.safetensors",
125
+ "layers.20.attention.wq.weight": "consolidated-00008-of-00051.safetensors",
126
+ "layers.20.attention.wv.weight": "consolidated-00008-of-00051.safetensors",
127
+ "layers.20.attention_norm.weight": "consolidated-00008-of-00051.safetensors",
128
+ "layers.20.feed_forward.w1.weight": "consolidated-00008-of-00051.safetensors",
129
+ "layers.20.feed_forward.w2.weight": "consolidated-00008-of-00051.safetensors",
130
+ "layers.20.feed_forward.w3.weight": "consolidated-00008-of-00051.safetensors",
131
+ "layers.20.ffn_norm.weight": "consolidated-00008-of-00051.safetensors",
132
+ "layers.21.attention.wk.weight": "consolidated-00008-of-00051.safetensors",
133
+ "layers.21.attention.wo.weight": "consolidated-00009-of-00051.safetensors",
134
+ "layers.21.attention.wq.weight": "consolidated-00009-of-00051.safetensors",
135
+ "layers.21.attention.wv.weight": "consolidated-00009-of-00051.safetensors",
136
+ "layers.21.attention_norm.weight": "consolidated-00009-of-00051.safetensors",
137
+ "layers.21.feed_forward.w1.weight": "consolidated-00009-of-00051.safetensors",
138
+ "layers.21.feed_forward.w2.weight": "consolidated-00009-of-00051.safetensors",
139
+ "layers.21.feed_forward.w3.weight": "consolidated-00009-of-00051.safetensors",
140
+ "layers.21.ffn_norm.weight": "consolidated-00009-of-00051.safetensors",
141
+ "layers.22.attention.wk.weight": "consolidated-00009-of-00051.safetensors",
142
+ "layers.22.attention.wo.weight": "consolidated-00009-of-00051.safetensors",
143
+ "layers.22.attention.wq.weight": "consolidated-00009-of-00051.safetensors",
144
+ "layers.22.attention.wv.weight": "consolidated-00009-of-00051.safetensors",
145
+ "layers.22.attention_norm.weight": "consolidated-00009-of-00051.safetensors",
146
+ "layers.22.feed_forward.w1.weight": "consolidated-00009-of-00051.safetensors",
147
+ "layers.22.feed_forward.w2.weight": "consolidated-00009-of-00051.safetensors",
148
+ "layers.22.feed_forward.w3.weight": "consolidated-00010-of-00051.safetensors",
149
+ "layers.22.ffn_norm.weight": "consolidated-00010-of-00051.safetensors",
150
+ "layers.23.attention.wk.weight": "consolidated-00010-of-00051.safetensors",
151
+ "layers.23.attention.wo.weight": "consolidated-00010-of-00051.safetensors",
152
+ "layers.23.attention.wq.weight": "consolidated-00010-of-00051.safetensors",
153
+ "layers.23.attention.wv.weight": "consolidated-00010-of-00051.safetensors",
154
+ "layers.23.attention_norm.weight": "consolidated-00010-of-00051.safetensors",
155
+ "layers.23.feed_forward.w1.weight": "consolidated-00010-of-00051.safetensors",
156
+ "layers.23.feed_forward.w2.weight": "consolidated-00010-of-00051.safetensors",
157
+ "layers.23.feed_forward.w3.weight": "consolidated-00010-of-00051.safetensors",
158
+ "layers.23.ffn_norm.weight": "consolidated-00010-of-00051.safetensors",
159
+ "layers.24.attention.wk.weight": "consolidated-00010-of-00051.safetensors",
160
+ "layers.24.attention.wo.weight": "consolidated-00010-of-00051.safetensors",
161
+ "layers.24.attention.wq.weight": "consolidated-00010-of-00051.safetensors",
162
+ "layers.24.attention.wv.weight": "consolidated-00010-of-00051.safetensors",
163
+ "layers.24.attention_norm.weight": "consolidated-00010-of-00051.safetensors",
164
+ "layers.24.feed_forward.w1.weight": "consolidated-00010-of-00051.safetensors",
165
+ "layers.24.feed_forward.w2.weight": "consolidated-00011-of-00051.safetensors",
166
+ "layers.24.feed_forward.w3.weight": "consolidated-00011-of-00051.safetensors",
167
+ "layers.24.ffn_norm.weight": "consolidated-00011-of-00051.safetensors",
168
+ "layers.25.attention.wk.weight": "consolidated-00011-of-00051.safetensors",
169
+ "layers.25.attention.wo.weight": "consolidated-00011-of-00051.safetensors",
170
+ "layers.25.attention.wq.weight": "consolidated-00011-of-00051.safetensors",
171
+ "layers.25.attention.wv.weight": "consolidated-00011-of-00051.safetensors",
172
+ "layers.25.attention_norm.weight": "consolidated-00011-of-00051.safetensors",
173
+ "layers.25.feed_forward.w1.weight": "consolidated-00011-of-00051.safetensors",
174
+ "layers.25.feed_forward.w2.weight": "consolidated-00011-of-00051.safetensors",
175
+ "layers.25.feed_forward.w3.weight": "consolidated-00011-of-00051.safetensors",
176
+ "layers.25.ffn_norm.weight": "consolidated-00011-of-00051.safetensors",
177
+ "layers.26.attention.wk.weight": "consolidated-00011-of-00051.safetensors",
178
+ "layers.26.attention.wo.weight": "consolidated-00011-of-00051.safetensors",
179
+ "layers.26.attention.wq.weight": "consolidated-00011-of-00051.safetensors",
180
+ "layers.26.attention.wv.weight": "consolidated-00011-of-00051.safetensors",
181
+ "layers.26.attention_norm.weight": "consolidated-00011-of-00051.safetensors",
182
+ "layers.26.feed_forward.w1.weight": "consolidated-00012-of-00051.safetensors",
183
+ "layers.26.feed_forward.w2.weight": "consolidated-00012-of-00051.safetensors",
184
+ "layers.26.feed_forward.w3.weight": "consolidated-00012-of-00051.safetensors",
185
+ "layers.26.ffn_norm.weight": "consolidated-00012-of-00051.safetensors",
186
+ "layers.27.attention.wk.weight": "consolidated-00012-of-00051.safetensors",
187
+ "layers.27.attention.wo.weight": "consolidated-00012-of-00051.safetensors",
188
+ "layers.27.attention.wq.weight": "consolidated-00012-of-00051.safetensors",
189
+ "layers.27.attention.wv.weight": "consolidated-00012-of-00051.safetensors",
190
+ "layers.27.attention_norm.weight": "consolidated-00012-of-00051.safetensors",
191
+ "layers.27.feed_forward.w1.weight": "consolidated-00012-of-00051.safetensors",
192
+ "layers.27.feed_forward.w2.weight": "consolidated-00012-of-00051.safetensors",
193
+ "layers.27.feed_forward.w3.weight": "consolidated-00012-of-00051.safetensors",
194
+ "layers.27.ffn_norm.weight": "consolidated-00012-of-00051.safetensors",
195
+ "layers.28.attention.wk.weight": "consolidated-00012-of-00051.safetensors",
196
+ "layers.28.attention.wo.weight": "consolidated-00013-of-00051.safetensors",
197
+ "layers.28.attention.wq.weight": "consolidated-00013-of-00051.safetensors",
198
+ "layers.28.attention.wv.weight": "consolidated-00013-of-00051.safetensors",
199
+ "layers.28.attention_norm.weight": "consolidated-00013-of-00051.safetensors",
200
+ "layers.28.feed_forward.w1.weight": "consolidated-00013-of-00051.safetensors",
201
+ "layers.28.feed_forward.w2.weight": "consolidated-00013-of-00051.safetensors",
202
+ "layers.28.feed_forward.w3.weight": "consolidated-00013-of-00051.safetensors",
203
+ "layers.28.ffn_norm.weight": "consolidated-00013-of-00051.safetensors",
204
+ "layers.29.attention.wk.weight": "consolidated-00013-of-00051.safetensors",
205
+ "layers.29.attention.wo.weight": "consolidated-00013-of-00051.safetensors",
206
+ "layers.29.attention.wq.weight": "consolidated-00013-of-00051.safetensors",
207
+ "layers.29.attention.wv.weight": "consolidated-00013-of-00051.safetensors",
208
+ "layers.29.attention_norm.weight": "consolidated-00013-of-00051.safetensors",
209
+ "layers.29.feed_forward.w1.weight": "consolidated-00013-of-00051.safetensors",
210
+ "layers.29.feed_forward.w2.weight": "consolidated-00013-of-00051.safetensors",
211
+ "layers.29.feed_forward.w3.weight": "consolidated-00014-of-00051.safetensors",
212
+ "layers.29.ffn_norm.weight": "consolidated-00014-of-00051.safetensors",
213
+ "layers.3.attention.wk.weight": "consolidated-00014-of-00051.safetensors",
214
+ "layers.3.attention.wo.weight": "consolidated-00014-of-00051.safetensors",
215
+ "layers.3.attention.wq.weight": "consolidated-00014-of-00051.safetensors",
216
+ "layers.3.attention.wv.weight": "consolidated-00014-of-00051.safetensors",
217
+ "layers.3.attention_norm.weight": "consolidated-00014-of-00051.safetensors",
218
+ "layers.3.feed_forward.w1.weight": "consolidated-00014-of-00051.safetensors",
219
+ "layers.3.feed_forward.w2.weight": "consolidated-00014-of-00051.safetensors",
220
+ "layers.3.feed_forward.w3.weight": "consolidated-00014-of-00051.safetensors",
221
+ "layers.3.ffn_norm.weight": "consolidated-00014-of-00051.safetensors",
222
+ "layers.30.attention.wk.weight": "consolidated-00014-of-00051.safetensors",
223
+ "layers.30.attention.wo.weight": "consolidated-00014-of-00051.safetensors",
224
+ "layers.30.attention.wq.weight": "consolidated-00014-of-00051.safetensors",
225
+ "layers.30.attention.wv.weight": "consolidated-00014-of-00051.safetensors",
226
+ "layers.30.attention_norm.weight": "consolidated-00014-of-00051.safetensors",
227
+ "layers.30.feed_forward.w1.weight": "consolidated-00014-of-00051.safetensors",
228
+ "layers.30.feed_forward.w2.weight": "consolidated-00015-of-00051.safetensors",
229
+ "layers.30.feed_forward.w3.weight": "consolidated-00015-of-00051.safetensors",
230
+ "layers.30.ffn_norm.weight": "consolidated-00015-of-00051.safetensors",
231
+ "layers.31.attention.wk.weight": "consolidated-00015-of-00051.safetensors",
232
+ "layers.31.attention.wo.weight": "consolidated-00015-of-00051.safetensors",
233
+ "layers.31.attention.wq.weight": "consolidated-00015-of-00051.safetensors",
234
+ "layers.31.attention.wv.weight": "consolidated-00015-of-00051.safetensors",
235
+ "layers.31.attention_norm.weight": "consolidated-00015-of-00051.safetensors",
236
+ "layers.31.feed_forward.w1.weight": "consolidated-00015-of-00051.safetensors",
237
+ "layers.31.feed_forward.w2.weight": "consolidated-00015-of-00051.safetensors",
238
+ "layers.31.feed_forward.w3.weight": "consolidated-00015-of-00051.safetensors",
239
+ "layers.31.ffn_norm.weight": "consolidated-00015-of-00051.safetensors",
240
+ "layers.32.attention.wk.weight": "consolidated-00015-of-00051.safetensors",
241
+ "layers.32.attention.wo.weight": "consolidated-00015-of-00051.safetensors",
242
+ "layers.32.attention.wq.weight": "consolidated-00015-of-00051.safetensors",
243
+ "layers.32.attention.wv.weight": "consolidated-00015-of-00051.safetensors",
244
+ "layers.32.attention_norm.weight": "consolidated-00015-of-00051.safetensors",
245
+ "layers.32.feed_forward.w1.weight": "consolidated-00016-of-00051.safetensors",
246
+ "layers.32.feed_forward.w2.weight": "consolidated-00016-of-00051.safetensors",
247
+ "layers.32.feed_forward.w3.weight": "consolidated-00016-of-00051.safetensors",
248
+ "layers.32.ffn_norm.weight": "consolidated-00016-of-00051.safetensors",
249
+ "layers.33.attention.wk.weight": "consolidated-00016-of-00051.safetensors",
250
+ "layers.33.attention.wo.weight": "consolidated-00016-of-00051.safetensors",
251
+ "layers.33.attention.wq.weight": "consolidated-00016-of-00051.safetensors",
252
+ "layers.33.attention.wv.weight": "consolidated-00016-of-00051.safetensors",
253
+ "layers.33.attention_norm.weight": "consolidated-00016-of-00051.safetensors",
254
+ "layers.33.feed_forward.w1.weight": "consolidated-00016-of-00051.safetensors",
255
+ "layers.33.feed_forward.w2.weight": "consolidated-00016-of-00051.safetensors",
256
+ "layers.33.feed_forward.w3.weight": "consolidated-00016-of-00051.safetensors",
257
+ "layers.33.ffn_norm.weight": "consolidated-00016-of-00051.safetensors",
258
+ "layers.34.attention.wk.weight": "consolidated-00016-of-00051.safetensors",
259
+ "layers.34.attention.wo.weight": "consolidated-00017-of-00051.safetensors",
260
+ "layers.34.attention.wq.weight": "consolidated-00017-of-00051.safetensors",
261
+ "layers.34.attention.wv.weight": "consolidated-00017-of-00051.safetensors",
262
+ "layers.34.attention_norm.weight": "consolidated-00017-of-00051.safetensors",
263
+ "layers.34.feed_forward.w1.weight": "consolidated-00017-of-00051.safetensors",
264
+ "layers.34.feed_forward.w2.weight": "consolidated-00017-of-00051.safetensors",
265
+ "layers.34.feed_forward.w3.weight": "consolidated-00017-of-00051.safetensors",
266
+ "layers.34.ffn_norm.weight": "consolidated-00017-of-00051.safetensors",
267
+ "layers.35.attention.wk.weight": "consolidated-00017-of-00051.safetensors",
268
+ "layers.35.attention.wo.weight": "consolidated-00017-of-00051.safetensors",
269
+ "layers.35.attention.wq.weight": "consolidated-00017-of-00051.safetensors",
270
+ "layers.35.attention.wv.weight": "consolidated-00017-of-00051.safetensors",
271
+ "layers.35.attention_norm.weight": "consolidated-00017-of-00051.safetensors",
272
+ "layers.35.feed_forward.w1.weight": "consolidated-00017-of-00051.safetensors",
273
+ "layers.35.feed_forward.w2.weight": "consolidated-00017-of-00051.safetensors",
274
+ "layers.35.feed_forward.w3.weight": "consolidated-00018-of-00051.safetensors",
275
+ "layers.35.ffn_norm.weight": "consolidated-00018-of-00051.safetensors",
276
+ "layers.36.attention.wk.weight": "consolidated-00018-of-00051.safetensors",
277
+ "layers.36.attention.wo.weight": "consolidated-00018-of-00051.safetensors",
278
+ "layers.36.attention.wq.weight": "consolidated-00018-of-00051.safetensors",
279
+ "layers.36.attention.wv.weight": "consolidated-00018-of-00051.safetensors",
280
+ "layers.36.attention_norm.weight": "consolidated-00018-of-00051.safetensors",
281
+ "layers.36.feed_forward.w1.weight": "consolidated-00018-of-00051.safetensors",
282
+ "layers.36.feed_forward.w2.weight": "consolidated-00018-of-00051.safetensors",
283
+ "layers.36.feed_forward.w3.weight": "consolidated-00018-of-00051.safetensors",
284
+ "layers.36.ffn_norm.weight": "consolidated-00018-of-00051.safetensors",
285
+ "layers.37.attention.wk.weight": "consolidated-00018-of-00051.safetensors",
286
+ "layers.37.attention.wo.weight": "consolidated-00018-of-00051.safetensors",
287
+ "layers.37.attention.wq.weight": "consolidated-00018-of-00051.safetensors",
288
+ "layers.37.attention.wv.weight": "consolidated-00018-of-00051.safetensors",
289
+ "layers.37.attention_norm.weight": "consolidated-00018-of-00051.safetensors",
290
+ "layers.37.feed_forward.w1.weight": "consolidated-00018-of-00051.safetensors",
291
+ "layers.37.feed_forward.w2.weight": "consolidated-00019-of-00051.safetensors",
292
+ "layers.37.feed_forward.w3.weight": "consolidated-00019-of-00051.safetensors",
293
+ "layers.37.ffn_norm.weight": "consolidated-00019-of-00051.safetensors",
294
+ "layers.38.attention.wk.weight": "consolidated-00019-of-00051.safetensors",
295
+ "layers.38.attention.wo.weight": "consolidated-00019-of-00051.safetensors",
296
+ "layers.38.attention.wq.weight": "consolidated-00019-of-00051.safetensors",
297
+ "layers.38.attention.wv.weight": "consolidated-00019-of-00051.safetensors",
298
+ "layers.38.attention_norm.weight": "consolidated-00019-of-00051.safetensors",
299
+ "layers.38.feed_forward.w1.weight": "consolidated-00019-of-00051.safetensors",
300
+ "layers.38.feed_forward.w2.weight": "consolidated-00019-of-00051.safetensors",
301
+ "layers.38.feed_forward.w3.weight": "consolidated-00019-of-00051.safetensors",
302
+ "layers.38.ffn_norm.weight": "consolidated-00019-of-00051.safetensors",
303
+ "layers.39.attention.wk.weight": "consolidated-00019-of-00051.safetensors",
304
+ "layers.39.attention.wo.weight": "consolidated-00019-of-00051.safetensors",
305
+ "layers.39.attention.wq.weight": "consolidated-00019-of-00051.safetensors",
306
+ "layers.39.attention.wv.weight": "consolidated-00019-of-00051.safetensors",
307
+ "layers.39.attention_norm.weight": "consolidated-00019-of-00051.safetensors",
308
+ "layers.39.feed_forward.w1.weight": "consolidated-00020-of-00051.safetensors",
309
+ "layers.39.feed_forward.w2.weight": "consolidated-00020-of-00051.safetensors",
310
+ "layers.39.feed_forward.w3.weight": "consolidated-00020-of-00051.safetensors",
311
+ "layers.39.ffn_norm.weight": "consolidated-00020-of-00051.safetensors",
312
+ "layers.4.attention.wk.weight": "consolidated-00020-of-00051.safetensors",
313
+ "layers.4.attention.wo.weight": "consolidated-00020-of-00051.safetensors",
314
+ "layers.4.attention.wq.weight": "consolidated-00020-of-00051.safetensors",
315
+ "layers.4.attention.wv.weight": "consolidated-00020-of-00051.safetensors",
316
+ "layers.4.attention_norm.weight": "consolidated-00020-of-00051.safetensors",
317
+ "layers.4.feed_forward.w1.weight": "consolidated-00020-of-00051.safetensors",
318
+ "layers.4.feed_forward.w2.weight": "consolidated-00020-of-00051.safetensors",
319
+ "layers.4.feed_forward.w3.weight": "consolidated-00020-of-00051.safetensors",
320
+ "layers.4.ffn_norm.weight": "consolidated-00020-of-00051.safetensors",
321
+ "layers.40.attention.wk.weight": "consolidated-00020-of-00051.safetensors",
322
+ "layers.40.attention.wo.weight": "consolidated-00021-of-00051.safetensors",
323
+ "layers.40.attention.wq.weight": "consolidated-00021-of-00051.safetensors",
324
+ "layers.40.attention.wv.weight": "consolidated-00021-of-00051.safetensors",
325
+ "layers.40.attention_norm.weight": "consolidated-00021-of-00051.safetensors",
326
+ "layers.40.feed_forward.w1.weight": "consolidated-00021-of-00051.safetensors",
327
+ "layers.40.feed_forward.w2.weight": "consolidated-00021-of-00051.safetensors",
328
+ "layers.40.feed_forward.w3.weight": "consolidated-00021-of-00051.safetensors",
329
+ "layers.40.ffn_norm.weight": "consolidated-00021-of-00051.safetensors",
330
+ "layers.41.attention.wk.weight": "consolidated-00021-of-00051.safetensors",
331
+ "layers.41.attention.wo.weight": "consolidated-00021-of-00051.safetensors",
332
+ "layers.41.attention.wq.weight": "consolidated-00021-of-00051.safetensors",
333
+ "layers.41.attention.wv.weight": "consolidated-00021-of-00051.safetensors",
334
+ "layers.41.attention_norm.weight": "consolidated-00021-of-00051.safetensors",
335
+ "layers.41.feed_forward.w1.weight": "consolidated-00021-of-00051.safetensors",
336
+ "layers.41.feed_forward.w2.weight": "consolidated-00021-of-00051.safetensors",
337
+ "layers.41.feed_forward.w3.weight": "consolidated-00022-of-00051.safetensors",
338
+ "layers.41.ffn_norm.weight": "consolidated-00022-of-00051.safetensors",
339
+ "layers.42.attention.wk.weight": "consolidated-00022-of-00051.safetensors",
340
+ "layers.42.attention.wo.weight": "consolidated-00022-of-00051.safetensors",
341
+ "layers.42.attention.wq.weight": "consolidated-00022-of-00051.safetensors",
342
+ "layers.42.attention.wv.weight": "consolidated-00022-of-00051.safetensors",
343
+ "layers.42.attention_norm.weight": "consolidated-00022-of-00051.safetensors",
344
+ "layers.42.feed_forward.w1.weight": "consolidated-00022-of-00051.safetensors",
345
+ "layers.42.feed_forward.w2.weight": "consolidated-00022-of-00051.safetensors",
346
+ "layers.42.feed_forward.w3.weight": "consolidated-00022-of-00051.safetensors",
347
+ "layers.42.ffn_norm.weight": "consolidated-00022-of-00051.safetensors",
348
+ "layers.43.attention.wk.weight": "consolidated-00022-of-00051.safetensors",
349
+ "layers.43.attention.wo.weight": "consolidated-00022-of-00051.safetensors",
350
+ "layers.43.attention.wq.weight": "consolidated-00022-of-00051.safetensors",
351
+ "layers.43.attention.wv.weight": "consolidated-00022-of-00051.safetensors",
352
+ "layers.43.attention_norm.weight": "consolidated-00022-of-00051.safetensors",
353
+ "layers.43.feed_forward.w1.weight": "consolidated-00022-of-00051.safetensors",
354
+ "layers.43.feed_forward.w2.weight": "consolidated-00023-of-00051.safetensors",
355
+ "layers.43.feed_forward.w3.weight": "consolidated-00023-of-00051.safetensors",
356
+ "layers.43.ffn_norm.weight": "consolidated-00023-of-00051.safetensors",
357
+ "layers.44.attention.wk.weight": "consolidated-00023-of-00051.safetensors",
358
+ "layers.44.attention.wo.weight": "consolidated-00023-of-00051.safetensors",
359
+ "layers.44.attention.wq.weight": "consolidated-00023-of-00051.safetensors",
360
+ "layers.44.attention.wv.weight": "consolidated-00023-of-00051.safetensors",
361
+ "layers.44.attention_norm.weight": "consolidated-00023-of-00051.safetensors",
362
+ "layers.44.feed_forward.w1.weight": "consolidated-00023-of-00051.safetensors",
363
+ "layers.44.feed_forward.w2.weight": "consolidated-00023-of-00051.safetensors",
364
+ "layers.44.feed_forward.w3.weight": "consolidated-00023-of-00051.safetensors",
365
+ "layers.44.ffn_norm.weight": "consolidated-00023-of-00051.safetensors",
366
+ "layers.45.attention.wk.weight": "consolidated-00023-of-00051.safetensors",
367
+ "layers.45.attention.wo.weight": "consolidated-00023-of-00051.safetensors",
368
+ "layers.45.attention.wq.weight": "consolidated-00023-of-00051.safetensors",
369
+ "layers.45.attention.wv.weight": "consolidated-00023-of-00051.safetensors",
370
+ "layers.45.attention_norm.weight": "consolidated-00023-of-00051.safetensors",
371
+ "layers.45.feed_forward.w1.weight": "consolidated-00024-of-00051.safetensors",
372
+ "layers.45.feed_forward.w2.weight": "consolidated-00024-of-00051.safetensors",
373
+ "layers.45.feed_forward.w3.weight": "consolidated-00024-of-00051.safetensors",
374
+ "layers.45.ffn_norm.weight": "consolidated-00024-of-00051.safetensors",
375
+ "layers.46.attention.wk.weight": "consolidated-00024-of-00051.safetensors",
376
+ "layers.46.attention.wo.weight": "consolidated-00024-of-00051.safetensors",
377
+ "layers.46.attention.wq.weight": "consolidated-00024-of-00051.safetensors",
378
+ "layers.46.attention.wv.weight": "consolidated-00024-of-00051.safetensors",
379
+ "layers.46.attention_norm.weight": "consolidated-00024-of-00051.safetensors",
380
+ "layers.46.feed_forward.w1.weight": "consolidated-00024-of-00051.safetensors",
381
+ "layers.46.feed_forward.w2.weight": "consolidated-00024-of-00051.safetensors",
382
+ "layers.46.feed_forward.w3.weight": "consolidated-00024-of-00051.safetensors",
383
+ "layers.46.ffn_norm.weight": "consolidated-00024-of-00051.safetensors",
384
+ "layers.47.attention.wk.weight": "consolidated-00024-of-00051.safetensors",
385
+ "layers.47.attention.wo.weight": "consolidated-00025-of-00051.safetensors",
386
+ "layers.47.attention.wq.weight": "consolidated-00025-of-00051.safetensors",
387
+ "layers.47.attention.wv.weight": "consolidated-00025-of-00051.safetensors",
388
+ "layers.47.attention_norm.weight": "consolidated-00025-of-00051.safetensors",
389
+ "layers.47.feed_forward.w1.weight": "consolidated-00025-of-00051.safetensors",
390
+ "layers.47.feed_forward.w2.weight": "consolidated-00025-of-00051.safetensors",
391
+ "layers.47.feed_forward.w3.weight": "consolidated-00025-of-00051.safetensors",
392
+ "layers.47.ffn_norm.weight": "consolidated-00025-of-00051.safetensors",
393
+ "layers.48.attention.wk.weight": "consolidated-00025-of-00051.safetensors",
394
+ "layers.48.attention.wo.weight": "consolidated-00025-of-00051.safetensors",
395
+ "layers.48.attention.wq.weight": "consolidated-00025-of-00051.safetensors",
396
+ "layers.48.attention.wv.weight": "consolidated-00025-of-00051.safetensors",
397
+ "layers.48.attention_norm.weight": "consolidated-00025-of-00051.safetensors",
398
+ "layers.48.feed_forward.w1.weight": "consolidated-00025-of-00051.safetensors",
399
+ "layers.48.feed_forward.w2.weight": "consolidated-00025-of-00051.safetensors",
400
+ "layers.48.feed_forward.w3.weight": "consolidated-00026-of-00051.safetensors",
401
+ "layers.48.ffn_norm.weight": "consolidated-00026-of-00051.safetensors",
402
+ "layers.49.attention.wk.weight": "consolidated-00026-of-00051.safetensors",
403
+ "layers.49.attention.wo.weight": "consolidated-00026-of-00051.safetensors",
404
+ "layers.49.attention.wq.weight": "consolidated-00026-of-00051.safetensors",
405
+ "layers.49.attention.wv.weight": "consolidated-00026-of-00051.safetensors",
406
+ "layers.49.attention_norm.weight": "consolidated-00026-of-00051.safetensors",
407
+ "layers.49.feed_forward.w1.weight": "consolidated-00026-of-00051.safetensors",
408
+ "layers.49.feed_forward.w2.weight": "consolidated-00026-of-00051.safetensors",
409
+ "layers.49.feed_forward.w3.weight": "consolidated-00026-of-00051.safetensors",
410
+ "layers.49.ffn_norm.weight": "consolidated-00026-of-00051.safetensors",
411
+ "layers.5.attention.wk.weight": "consolidated-00026-of-00051.safetensors",
412
+ "layers.5.attention.wo.weight": "consolidated-00026-of-00051.safetensors",
413
+ "layers.5.attention.wq.weight": "consolidated-00026-of-00051.safetensors",
414
+ "layers.5.attention.wv.weight": "consolidated-00026-of-00051.safetensors",
415
+ "layers.5.attention_norm.weight": "consolidated-00026-of-00051.safetensors",
416
+ "layers.5.feed_forward.w1.weight": "consolidated-00026-of-00051.safetensors",
417
+ "layers.5.feed_forward.w2.weight": "consolidated-00027-of-00051.safetensors",
418
+ "layers.5.feed_forward.w3.weight": "consolidated-00027-of-00051.safetensors",
419
+ "layers.5.ffn_norm.weight": "consolidated-00027-of-00051.safetensors",
420
+ "layers.50.attention.wk.weight": "consolidated-00027-of-00051.safetensors",
421
+ "layers.50.attention.wo.weight": "consolidated-00027-of-00051.safetensors",
422
+ "layers.50.attention.wq.weight": "consolidated-00027-of-00051.safetensors",
423
+ "layers.50.attention.wv.weight": "consolidated-00027-of-00051.safetensors",
424
+ "layers.50.attention_norm.weight": "consolidated-00027-of-00051.safetensors",
425
+ "layers.50.feed_forward.w1.weight": "consolidated-00027-of-00051.safetensors",
426
+ "layers.50.feed_forward.w2.weight": "consolidated-00027-of-00051.safetensors",
427
+ "layers.50.feed_forward.w3.weight": "consolidated-00027-of-00051.safetensors",
428
+ "layers.50.ffn_norm.weight": "consolidated-00027-of-00051.safetensors",
429
+ "layers.51.attention.wk.weight": "consolidated-00027-of-00051.safetensors",
430
+ "layers.51.attention.wo.weight": "consolidated-00027-of-00051.safetensors",
431
+ "layers.51.attention.wq.weight": "consolidated-00027-of-00051.safetensors",
432
+ "layers.51.attention.wv.weight": "consolidated-00027-of-00051.safetensors",
433
+ "layers.51.attention_norm.weight": "consolidated-00027-of-00051.safetensors",
434
+ "layers.51.feed_forward.w1.weight": "consolidated-00028-of-00051.safetensors",
435
+ "layers.51.feed_forward.w2.weight": "consolidated-00028-of-00051.safetensors",
436
+ "layers.51.feed_forward.w3.weight": "consolidated-00028-of-00051.safetensors",
437
+ "layers.51.ffn_norm.weight": "consolidated-00028-of-00051.safetensors",
438
+ "layers.52.attention.wk.weight": "consolidated-00028-of-00051.safetensors",
439
+ "layers.52.attention.wo.weight": "consolidated-00028-of-00051.safetensors",
440
+ "layers.52.attention.wq.weight": "consolidated-00028-of-00051.safetensors",
441
+ "layers.52.attention.wv.weight": "consolidated-00028-of-00051.safetensors",
442
+ "layers.52.attention_norm.weight": "consolidated-00028-of-00051.safetensors",
443
+ "layers.52.feed_forward.w1.weight": "consolidated-00028-of-00051.safetensors",
444
+ "layers.52.feed_forward.w2.weight": "consolidated-00028-of-00051.safetensors",
445
+ "layers.52.feed_forward.w3.weight": "consolidated-00028-of-00051.safetensors",
446
+ "layers.52.ffn_norm.weight": "consolidated-00028-of-00051.safetensors",
447
+ "layers.53.attention.wk.weight": "consolidated-00028-of-00051.safetensors",
448
+ "layers.53.attention.wo.weight": "consolidated-00029-of-00051.safetensors",
449
+ "layers.53.attention.wq.weight": "consolidated-00029-of-00051.safetensors",
450
+ "layers.53.attention.wv.weight": "consolidated-00029-of-00051.safetensors",
451
+ "layers.53.attention_norm.weight": "consolidated-00029-of-00051.safetensors",
452
+ "layers.53.feed_forward.w1.weight": "consolidated-00029-of-00051.safetensors",
453
+ "layers.53.feed_forward.w2.weight": "consolidated-00029-of-00051.safetensors",
454
+ "layers.53.feed_forward.w3.weight": "consolidated-00029-of-00051.safetensors",
455
+ "layers.53.ffn_norm.weight": "consolidated-00029-of-00051.safetensors",
456
+ "layers.54.attention.wk.weight": "consolidated-00029-of-00051.safetensors",
457
+ "layers.54.attention.wo.weight": "consolidated-00029-of-00051.safetensors",
458
+ "layers.54.attention.wq.weight": "consolidated-00029-of-00051.safetensors",
459
+ "layers.54.attention.wv.weight": "consolidated-00029-of-00051.safetensors",
460
+ "layers.54.attention_norm.weight": "consolidated-00029-of-00051.safetensors",
461
+ "layers.54.feed_forward.w1.weight": "consolidated-00029-of-00051.safetensors",
462
+ "layers.54.feed_forward.w2.weight": "consolidated-00029-of-00051.safetensors",
463
+ "layers.54.feed_forward.w3.weight": "consolidated-00030-of-00051.safetensors",
464
+ "layers.54.ffn_norm.weight": "consolidated-00030-of-00051.safetensors",
465
+ "layers.55.attention.wk.weight": "consolidated-00030-of-00051.safetensors",
466
+ "layers.55.attention.wo.weight": "consolidated-00030-of-00051.safetensors",
467
+ "layers.55.attention.wq.weight": "consolidated-00030-of-00051.safetensors",
468
+ "layers.55.attention.wv.weight": "consolidated-00030-of-00051.safetensors",
469
+ "layers.55.attention_norm.weight": "consolidated-00030-of-00051.safetensors",
470
+ "layers.55.feed_forward.w1.weight": "consolidated-00030-of-00051.safetensors",
471
+ "layers.55.feed_forward.w2.weight": "consolidated-00030-of-00051.safetensors",
472
+ "layers.55.feed_forward.w3.weight": "consolidated-00030-of-00051.safetensors",
473
+ "layers.55.ffn_norm.weight": "consolidated-00030-of-00051.safetensors",
474
+ "layers.56.attention.wk.weight": "consolidated-00030-of-00051.safetensors",
475
+ "layers.56.attention.wo.weight": "consolidated-00030-of-00051.safetensors",
476
+ "layers.56.attention.wq.weight": "consolidated-00030-of-00051.safetensors",
477
+ "layers.56.attention.wv.weight": "consolidated-00030-of-00051.safetensors",
478
+ "layers.56.attention_norm.weight": "consolidated-00030-of-00051.safetensors",
479
+ "layers.56.feed_forward.w1.weight": "consolidated-00030-of-00051.safetensors",
480
+ "layers.56.feed_forward.w2.weight": "consolidated-00031-of-00051.safetensors",
481
+ "layers.56.feed_forward.w3.weight": "consolidated-00031-of-00051.safetensors",
482
+ "layers.56.ffn_norm.weight": "consolidated-00031-of-00051.safetensors",
483
+ "layers.57.attention.wk.weight": "consolidated-00031-of-00051.safetensors",
484
+ "layers.57.attention.wo.weight": "consolidated-00031-of-00051.safetensors",
485
+ "layers.57.attention.wq.weight": "consolidated-00031-of-00051.safetensors",
486
+ "layers.57.attention.wv.weight": "consolidated-00031-of-00051.safetensors",
487
+ "layers.57.attention_norm.weight": "consolidated-00031-of-00051.safetensors",
488
+ "layers.57.feed_forward.w1.weight": "consolidated-00031-of-00051.safetensors",
489
+ "layers.57.feed_forward.w2.weight": "consolidated-00031-of-00051.safetensors",
490
+ "layers.57.feed_forward.w3.weight": "consolidated-00031-of-00051.safetensors",
491
+ "layers.57.ffn_norm.weight": "consolidated-00031-of-00051.safetensors",
492
+ "layers.58.attention.wk.weight": "consolidated-00031-of-00051.safetensors",
493
+ "layers.58.attention.wo.weight": "consolidated-00031-of-00051.safetensors",
494
+ "layers.58.attention.wq.weight": "consolidated-00031-of-00051.safetensors",
495
+ "layers.58.attention.wv.weight": "consolidated-00031-of-00051.safetensors",
496
+ "layers.58.attention_norm.weight": "consolidated-00031-of-00051.safetensors",
497
+ "layers.58.feed_forward.w1.weight": "consolidated-00032-of-00051.safetensors",
498
+ "layers.58.feed_forward.w2.weight": "consolidated-00032-of-00051.safetensors",
499
+ "layers.58.feed_forward.w3.weight": "consolidated-00032-of-00051.safetensors",
500
+ "layers.58.ffn_norm.weight": "consolidated-00032-of-00051.safetensors",
501
+ "layers.59.attention.wk.weight": "consolidated-00032-of-00051.safetensors",
502
+ "layers.59.attention.wo.weight": "consolidated-00032-of-00051.safetensors",
503
+ "layers.59.attention.wq.weight": "consolidated-00032-of-00051.safetensors",
504
+ "layers.59.attention.wv.weight": "consolidated-00032-of-00051.safetensors",
505
+ "layers.59.attention_norm.weight": "consolidated-00032-of-00051.safetensors",
506
+ "layers.59.feed_forward.w1.weight": "consolidated-00032-of-00051.safetensors",
507
+ "layers.59.feed_forward.w2.weight": "consolidated-00032-of-00051.safetensors",
508
+ "layers.59.feed_forward.w3.weight": "consolidated-00032-of-00051.safetensors",
509
+ "layers.59.ffn_norm.weight": "consolidated-00032-of-00051.safetensors",
510
+ "layers.6.attention.wk.weight": "consolidated-00032-of-00051.safetensors",
511
+ "layers.6.attention.wo.weight": "consolidated-00033-of-00051.safetensors",
512
+ "layers.6.attention.wq.weight": "consolidated-00033-of-00051.safetensors",
513
+ "layers.6.attention.wv.weight": "consolidated-00033-of-00051.safetensors",
514
+ "layers.6.attention_norm.weight": "consolidated-00033-of-00051.safetensors",
515
+ "layers.6.feed_forward.w1.weight": "consolidated-00033-of-00051.safetensors",
516
+ "layers.6.feed_forward.w2.weight": "consolidated-00033-of-00051.safetensors",
517
+ "layers.6.feed_forward.w3.weight": "consolidated-00033-of-00051.safetensors",
518
+ "layers.6.ffn_norm.weight": "consolidated-00033-of-00051.safetensors",
519
+ "layers.60.attention.wk.weight": "consolidated-00033-of-00051.safetensors",
520
+ "layers.60.attention.wo.weight": "consolidated-00033-of-00051.safetensors",
521
+ "layers.60.attention.wq.weight": "consolidated-00033-of-00051.safetensors",
522
+ "layers.60.attention.wv.weight": "consolidated-00033-of-00051.safetensors",
523
+ "layers.60.attention_norm.weight": "consolidated-00033-of-00051.safetensors",
524
+ "layers.60.feed_forward.w1.weight": "consolidated-00033-of-00051.safetensors",
525
+ "layers.60.feed_forward.w2.weight": "consolidated-00033-of-00051.safetensors",
526
+ "layers.60.feed_forward.w3.weight": "consolidated-00034-of-00051.safetensors",
527
+ "layers.60.ffn_norm.weight": "consolidated-00034-of-00051.safetensors",
528
+ "layers.61.attention.wk.weight": "consolidated-00034-of-00051.safetensors",
529
+ "layers.61.attention.wo.weight": "consolidated-00034-of-00051.safetensors",
530
+ "layers.61.attention.wq.weight": "consolidated-00034-of-00051.safetensors",
531
+ "layers.61.attention.wv.weight": "consolidated-00034-of-00051.safetensors",
532
+ "layers.61.attention_norm.weight": "consolidated-00034-of-00051.safetensors",
533
+ "layers.61.feed_forward.w1.weight": "consolidated-00034-of-00051.safetensors",
534
+ "layers.61.feed_forward.w2.weight": "consolidated-00034-of-00051.safetensors",
535
+ "layers.61.feed_forward.w3.weight": "consolidated-00034-of-00051.safetensors",
536
+ "layers.61.ffn_norm.weight": "consolidated-00034-of-00051.safetensors",
537
+ "layers.62.attention.wk.weight": "consolidated-00034-of-00051.safetensors",
538
+ "layers.62.attention.wo.weight": "consolidated-00034-of-00051.safetensors",
539
+ "layers.62.attention.wq.weight": "consolidated-00034-of-00051.safetensors",
540
+ "layers.62.attention.wv.weight": "consolidated-00034-of-00051.safetensors",
541
+ "layers.62.attention_norm.weight": "consolidated-00034-of-00051.safetensors",
542
+ "layers.62.feed_forward.w1.weight": "consolidated-00034-of-00051.safetensors",
543
+ "layers.62.feed_forward.w2.weight": "consolidated-00035-of-00051.safetensors",
544
+ "layers.62.feed_forward.w3.weight": "consolidated-00035-of-00051.safetensors",
545
+ "layers.62.ffn_norm.weight": "consolidated-00035-of-00051.safetensors",
546
+ "layers.63.attention.wk.weight": "consolidated-00035-of-00051.safetensors",
547
+ "layers.63.attention.wo.weight": "consolidated-00035-of-00051.safetensors",
548
+ "layers.63.attention.wq.weight": "consolidated-00035-of-00051.safetensors",
549
+ "layers.63.attention.wv.weight": "consolidated-00035-of-00051.safetensors",
550
+ "layers.63.attention_norm.weight": "consolidated-00035-of-00051.safetensors",
551
+ "layers.63.feed_forward.w1.weight": "consolidated-00035-of-00051.safetensors",
552
+ "layers.63.feed_forward.w2.weight": "consolidated-00035-of-00051.safetensors",
553
+ "layers.63.feed_forward.w3.weight": "consolidated-00035-of-00051.safetensors",
554
+ "layers.63.ffn_norm.weight": "consolidated-00035-of-00051.safetensors",
555
+ "layers.64.attention.wk.weight": "consolidated-00035-of-00051.safetensors",
556
+ "layers.64.attention.wo.weight": "consolidated-00035-of-00051.safetensors",
557
+ "layers.64.attention.wq.weight": "consolidated-00035-of-00051.safetensors",
558
+ "layers.64.attention.wv.weight": "consolidated-00035-of-00051.safetensors",
559
+ "layers.64.attention_norm.weight": "consolidated-00035-of-00051.safetensors",
560
+ "layers.64.feed_forward.w1.weight": "consolidated-00036-of-00051.safetensors",
561
+ "layers.64.feed_forward.w2.weight": "consolidated-00036-of-00051.safetensors",
562
+ "layers.64.feed_forward.w3.weight": "consolidated-00036-of-00051.safetensors",
563
+ "layers.64.ffn_norm.weight": "consolidated-00036-of-00051.safetensors",
564
+ "layers.65.attention.wk.weight": "consolidated-00036-of-00051.safetensors",
565
+ "layers.65.attention.wo.weight": "consolidated-00036-of-00051.safetensors",
566
+ "layers.65.attention.wq.weight": "consolidated-00036-of-00051.safetensors",
567
+ "layers.65.attention.wv.weight": "consolidated-00036-of-00051.safetensors",
568
+ "layers.65.attention_norm.weight": "consolidated-00036-of-00051.safetensors",
569
+ "layers.65.feed_forward.w1.weight": "consolidated-00036-of-00051.safetensors",
570
+ "layers.65.feed_forward.w2.weight": "consolidated-00036-of-00051.safetensors",
571
+ "layers.65.feed_forward.w3.weight": "consolidated-00036-of-00051.safetensors",
572
+ "layers.65.ffn_norm.weight": "consolidated-00036-of-00051.safetensors",
573
+ "layers.66.attention.wk.weight": "consolidated-00036-of-00051.safetensors",
574
+ "layers.66.attention.wo.weight": "consolidated-00037-of-00051.safetensors",
575
+ "layers.66.attention.wq.weight": "consolidated-00037-of-00051.safetensors",
576
+ "layers.66.attention.wv.weight": "consolidated-00037-of-00051.safetensors",
577
+ "layers.66.attention_norm.weight": "consolidated-00037-of-00051.safetensors",
578
+ "layers.66.feed_forward.w1.weight": "consolidated-00037-of-00051.safetensors",
579
+ "layers.66.feed_forward.w2.weight": "consolidated-00037-of-00051.safetensors",
580
+ "layers.66.feed_forward.w3.weight": "consolidated-00037-of-00051.safetensors",
581
+ "layers.66.ffn_norm.weight": "consolidated-00037-of-00051.safetensors",
582
+ "layers.67.attention.wk.weight": "consolidated-00037-of-00051.safetensors",
583
+ "layers.67.attention.wo.weight": "consolidated-00037-of-00051.safetensors",
584
+ "layers.67.attention.wq.weight": "consolidated-00037-of-00051.safetensors",
585
+ "layers.67.attention.wv.weight": "consolidated-00037-of-00051.safetensors",
586
+ "layers.67.attention_norm.weight": "consolidated-00037-of-00051.safetensors",
587
+ "layers.67.feed_forward.w1.weight": "consolidated-00037-of-00051.safetensors",
588
+ "layers.67.feed_forward.w2.weight": "consolidated-00037-of-00051.safetensors",
589
+ "layers.67.feed_forward.w3.weight": "consolidated-00038-of-00051.safetensors",
590
+ "layers.67.ffn_norm.weight": "consolidated-00038-of-00051.safetensors",
591
+ "layers.68.attention.wk.weight": "consolidated-00038-of-00051.safetensors",
592
+ "layers.68.attention.wo.weight": "consolidated-00038-of-00051.safetensors",
593
+ "layers.68.attention.wq.weight": "consolidated-00038-of-00051.safetensors",
594
+ "layers.68.attention.wv.weight": "consolidated-00038-of-00051.safetensors",
595
+ "layers.68.attention_norm.weight": "consolidated-00038-of-00051.safetensors",
596
+ "layers.68.feed_forward.w1.weight": "consolidated-00038-of-00051.safetensors",
597
+ "layers.68.feed_forward.w2.weight": "consolidated-00038-of-00051.safetensors",
598
+ "layers.68.feed_forward.w3.weight": "consolidated-00038-of-00051.safetensors",
599
+ "layers.68.ffn_norm.weight": "consolidated-00038-of-00051.safetensors",
600
+ "layers.69.attention.wk.weight": "consolidated-00038-of-00051.safetensors",
601
+ "layers.69.attention.wo.weight": "consolidated-00038-of-00051.safetensors",
602
+ "layers.69.attention.wq.weight": "consolidated-00038-of-00051.safetensors",
603
+ "layers.69.attention.wv.weight": "consolidated-00038-of-00051.safetensors",
604
+ "layers.69.attention_norm.weight": "consolidated-00038-of-00051.safetensors",
605
+ "layers.69.feed_forward.w1.weight": "consolidated-00038-of-00051.safetensors",
606
+ "layers.69.feed_forward.w2.weight": "consolidated-00039-of-00051.safetensors",
607
+ "layers.69.feed_forward.w3.weight": "consolidated-00039-of-00051.safetensors",
608
+ "layers.69.ffn_norm.weight": "consolidated-00039-of-00051.safetensors",
609
+ "layers.7.attention.wk.weight": "consolidated-00039-of-00051.safetensors",
610
+ "layers.7.attention.wo.weight": "consolidated-00039-of-00051.safetensors",
611
+ "layers.7.attention.wq.weight": "consolidated-00039-of-00051.safetensors",
612
+ "layers.7.attention.wv.weight": "consolidated-00039-of-00051.safetensors",
613
+ "layers.7.attention_norm.weight": "consolidated-00039-of-00051.safetensors",
614
+ "layers.7.feed_forward.w1.weight": "consolidated-00039-of-00051.safetensors",
615
+ "layers.7.feed_forward.w2.weight": "consolidated-00039-of-00051.safetensors",
616
+ "layers.7.feed_forward.w3.weight": "consolidated-00039-of-00051.safetensors",
617
+ "layers.7.ffn_norm.weight": "consolidated-00039-of-00051.safetensors",
618
+ "layers.70.attention.wk.weight": "consolidated-00039-of-00051.safetensors",
619
+ "layers.70.attention.wo.weight": "consolidated-00039-of-00051.safetensors",
620
+ "layers.70.attention.wq.weight": "consolidated-00039-of-00051.safetensors",
621
+ "layers.70.attention.wv.weight": "consolidated-00039-of-00051.safetensors",
622
+ "layers.70.attention_norm.weight": "consolidated-00039-of-00051.safetensors",
623
+ "layers.70.feed_forward.w1.weight": "consolidated-00040-of-00051.safetensors",
624
+ "layers.70.feed_forward.w2.weight": "consolidated-00040-of-00051.safetensors",
625
+ "layers.70.feed_forward.w3.weight": "consolidated-00040-of-00051.safetensors",
626
+ "layers.70.ffn_norm.weight": "consolidated-00040-of-00051.safetensors",
627
+ "layers.71.attention.wk.weight": "consolidated-00040-of-00051.safetensors",
628
+ "layers.71.attention.wo.weight": "consolidated-00040-of-00051.safetensors",
629
+ "layers.71.attention.wq.weight": "consolidated-00040-of-00051.safetensors",
630
+ "layers.71.attention.wv.weight": "consolidated-00040-of-00051.safetensors",
631
+ "layers.71.attention_norm.weight": "consolidated-00040-of-00051.safetensors",
632
+ "layers.71.feed_forward.w1.weight": "consolidated-00040-of-00051.safetensors",
633
+ "layers.71.feed_forward.w2.weight": "consolidated-00040-of-00051.safetensors",
634
+ "layers.71.feed_forward.w3.weight": "consolidated-00040-of-00051.safetensors",
635
+ "layers.71.ffn_norm.weight": "consolidated-00040-of-00051.safetensors",
636
+ "layers.72.attention.wk.weight": "consolidated-00040-of-00051.safetensors",
637
+ "layers.72.attention.wo.weight": "consolidated-00041-of-00051.safetensors",
638
+ "layers.72.attention.wq.weight": "consolidated-00041-of-00051.safetensors",
639
+ "layers.72.attention.wv.weight": "consolidated-00041-of-00051.safetensors",
640
+ "layers.72.attention_norm.weight": "consolidated-00041-of-00051.safetensors",
641
+ "layers.72.feed_forward.w1.weight": "consolidated-00041-of-00051.safetensors",
642
+ "layers.72.feed_forward.w2.weight": "consolidated-00041-of-00051.safetensors",
643
+ "layers.72.feed_forward.w3.weight": "consolidated-00041-of-00051.safetensors",
644
+ "layers.72.ffn_norm.weight": "consolidated-00041-of-00051.safetensors",
645
+ "layers.73.attention.wk.weight": "consolidated-00041-of-00051.safetensors",
646
+ "layers.73.attention.wo.weight": "consolidated-00041-of-00051.safetensors",
647
+ "layers.73.attention.wq.weight": "consolidated-00041-of-00051.safetensors",
648
+ "layers.73.attention.wv.weight": "consolidated-00041-of-00051.safetensors",
649
+ "layers.73.attention_norm.weight": "consolidated-00041-of-00051.safetensors",
650
+ "layers.73.feed_forward.w1.weight": "consolidated-00041-of-00051.safetensors",
651
+ "layers.73.feed_forward.w2.weight": "consolidated-00041-of-00051.safetensors",
652
+ "layers.73.feed_forward.w3.weight": "consolidated-00042-of-00051.safetensors",
653
+ "layers.73.ffn_norm.weight": "consolidated-00042-of-00051.safetensors",
654
+ "layers.74.attention.wk.weight": "consolidated-00042-of-00051.safetensors",
655
+ "layers.74.attention.wo.weight": "consolidated-00042-of-00051.safetensors",
656
+ "layers.74.attention.wq.weight": "consolidated-00042-of-00051.safetensors",
657
+ "layers.74.attention.wv.weight": "consolidated-00042-of-00051.safetensors",
658
+ "layers.74.attention_norm.weight": "consolidated-00042-of-00051.safetensors",
659
+ "layers.74.feed_forward.w1.weight": "consolidated-00042-of-00051.safetensors",
660
+ "layers.74.feed_forward.w2.weight": "consolidated-00042-of-00051.safetensors",
661
+ "layers.74.feed_forward.w3.weight": "consolidated-00042-of-00051.safetensors",
662
+ "layers.74.ffn_norm.weight": "consolidated-00042-of-00051.safetensors",
663
+ "layers.75.attention.wk.weight": "consolidated-00042-of-00051.safetensors",
664
+ "layers.75.attention.wo.weight": "consolidated-00042-of-00051.safetensors",
665
+ "layers.75.attention.wq.weight": "consolidated-00042-of-00051.safetensors",
666
+ "layers.75.attention.wv.weight": "consolidated-00042-of-00051.safetensors",
667
+ "layers.75.attention_norm.weight": "consolidated-00042-of-00051.safetensors",
668
+ "layers.75.feed_forward.w1.weight": "consolidated-00042-of-00051.safetensors",
669
+ "layers.75.feed_forward.w2.weight": "consolidated-00043-of-00051.safetensors",
670
+ "layers.75.feed_forward.w3.weight": "consolidated-00043-of-00051.safetensors",
671
+ "layers.75.ffn_norm.weight": "consolidated-00043-of-00051.safetensors",
672
+ "layers.76.attention.wk.weight": "consolidated-00043-of-00051.safetensors",
673
+ "layers.76.attention.wo.weight": "consolidated-00043-of-00051.safetensors",
674
+ "layers.76.attention.wq.weight": "consolidated-00043-of-00051.safetensors",
675
+ "layers.76.attention.wv.weight": "consolidated-00043-of-00051.safetensors",
676
+ "layers.76.attention_norm.weight": "consolidated-00043-of-00051.safetensors",
677
+ "layers.76.feed_forward.w1.weight": "consolidated-00043-of-00051.safetensors",
678
+ "layers.76.feed_forward.w2.weight": "consolidated-00043-of-00051.safetensors",
679
+ "layers.76.feed_forward.w3.weight": "consolidated-00043-of-00051.safetensors",
680
+ "layers.76.ffn_norm.weight": "consolidated-00043-of-00051.safetensors",
681
+ "layers.77.attention.wk.weight": "consolidated-00043-of-00051.safetensors",
682
+ "layers.77.attention.wo.weight": "consolidated-00043-of-00051.safetensors",
683
+ "layers.77.attention.wq.weight": "consolidated-00043-of-00051.safetensors",
684
+ "layers.77.attention.wv.weight": "consolidated-00043-of-00051.safetensors",
685
+ "layers.77.attention_norm.weight": "consolidated-00043-of-00051.safetensors",
686
+ "layers.77.feed_forward.w1.weight": "consolidated-00044-of-00051.safetensors",
687
+ "layers.77.feed_forward.w2.weight": "consolidated-00044-of-00051.safetensors",
688
+ "layers.77.feed_forward.w3.weight": "consolidated-00044-of-00051.safetensors",
689
+ "layers.77.ffn_norm.weight": "consolidated-00044-of-00051.safetensors",
690
+ "layers.78.attention.wk.weight": "consolidated-00044-of-00051.safetensors",
691
+ "layers.78.attention.wo.weight": "consolidated-00044-of-00051.safetensors",
692
+ "layers.78.attention.wq.weight": "consolidated-00044-of-00051.safetensors",
693
+ "layers.78.attention.wv.weight": "consolidated-00044-of-00051.safetensors",
694
+ "layers.78.attention_norm.weight": "consolidated-00044-of-00051.safetensors",
695
+ "layers.78.feed_forward.w1.weight": "consolidated-00044-of-00051.safetensors",
696
+ "layers.78.feed_forward.w2.weight": "consolidated-00044-of-00051.safetensors",
697
+ "layers.78.feed_forward.w3.weight": "consolidated-00044-of-00051.safetensors",
698
+ "layers.78.ffn_norm.weight": "consolidated-00044-of-00051.safetensors",
699
+ "layers.79.attention.wk.weight": "consolidated-00044-of-00051.safetensors",
700
+ "layers.79.attention.wo.weight": "consolidated-00045-of-00051.safetensors",
701
+ "layers.79.attention.wq.weight": "consolidated-00045-of-00051.safetensors",
702
+ "layers.79.attention.wv.weight": "consolidated-00045-of-00051.safetensors",
703
+ "layers.79.attention_norm.weight": "consolidated-00045-of-00051.safetensors",
704
+ "layers.79.feed_forward.w1.weight": "consolidated-00045-of-00051.safetensors",
705
+ "layers.79.feed_forward.w2.weight": "consolidated-00045-of-00051.safetensors",
706
+ "layers.79.feed_forward.w3.weight": "consolidated-00045-of-00051.safetensors",
707
+ "layers.79.ffn_norm.weight": "consolidated-00045-of-00051.safetensors",
708
+ "layers.8.attention.wk.weight": "consolidated-00045-of-00051.safetensors",
709
+ "layers.8.attention.wo.weight": "consolidated-00045-of-00051.safetensors",
710
+ "layers.8.attention.wq.weight": "consolidated-00045-of-00051.safetensors",
711
+ "layers.8.attention.wv.weight": "consolidated-00045-of-00051.safetensors",
712
+ "layers.8.attention_norm.weight": "consolidated-00045-of-00051.safetensors",
713
+ "layers.8.feed_forward.w1.weight": "consolidated-00045-of-00051.safetensors",
714
+ "layers.8.feed_forward.w2.weight": "consolidated-00045-of-00051.safetensors",
715
+ "layers.8.feed_forward.w3.weight": "consolidated-00046-of-00051.safetensors",
716
+ "layers.8.ffn_norm.weight": "consolidated-00046-of-00051.safetensors",
717
+ "layers.80.attention.wk.weight": "consolidated-00046-of-00051.safetensors",
718
+ "layers.80.attention.wo.weight": "consolidated-00046-of-00051.safetensors",
719
+ "layers.80.attention.wq.weight": "consolidated-00046-of-00051.safetensors",
720
+ "layers.80.attention.wv.weight": "consolidated-00046-of-00051.safetensors",
721
+ "layers.80.attention_norm.weight": "consolidated-00046-of-00051.safetensors",
722
+ "layers.80.feed_forward.w1.weight": "consolidated-00046-of-00051.safetensors",
723
+ "layers.80.feed_forward.w2.weight": "consolidated-00046-of-00051.safetensors",
724
+ "layers.80.feed_forward.w3.weight": "consolidated-00046-of-00051.safetensors",
725
+ "layers.80.ffn_norm.weight": "consolidated-00046-of-00051.safetensors",
726
+ "layers.81.attention.wk.weight": "consolidated-00046-of-00051.safetensors",
727
+ "layers.81.attention.wo.weight": "consolidated-00046-of-00051.safetensors",
728
+ "layers.81.attention.wq.weight": "consolidated-00046-of-00051.safetensors",
729
+ "layers.81.attention.wv.weight": "consolidated-00046-of-00051.safetensors",
730
+ "layers.81.attention_norm.weight": "consolidated-00046-of-00051.safetensors",
731
+ "layers.81.feed_forward.w1.weight": "consolidated-00046-of-00051.safetensors",
732
+ "layers.81.feed_forward.w2.weight": "consolidated-00047-of-00051.safetensors",
733
+ "layers.81.feed_forward.w3.weight": "consolidated-00047-of-00051.safetensors",
734
+ "layers.81.ffn_norm.weight": "consolidated-00047-of-00051.safetensors",
735
+ "layers.82.attention.wk.weight": "consolidated-00047-of-00051.safetensors",
736
+ "layers.82.attention.wo.weight": "consolidated-00047-of-00051.safetensors",
737
+ "layers.82.attention.wq.weight": "consolidated-00047-of-00051.safetensors",
738
+ "layers.82.attention.wv.weight": "consolidated-00047-of-00051.safetensors",
739
+ "layers.82.attention_norm.weight": "consolidated-00047-of-00051.safetensors",
740
+ "layers.82.feed_forward.w1.weight": "consolidated-00047-of-00051.safetensors",
741
+ "layers.82.feed_forward.w2.weight": "consolidated-00047-of-00051.safetensors",
742
+ "layers.82.feed_forward.w3.weight": "consolidated-00047-of-00051.safetensors",
743
+ "layers.82.ffn_norm.weight": "consolidated-00047-of-00051.safetensors",
744
+ "layers.83.attention.wk.weight": "consolidated-00047-of-00051.safetensors",
745
+ "layers.83.attention.wo.weight": "consolidated-00047-of-00051.safetensors",
746
+ "layers.83.attention.wq.weight": "consolidated-00047-of-00051.safetensors",
747
+ "layers.83.attention.wv.weight": "consolidated-00047-of-00051.safetensors",
748
+ "layers.83.attention_norm.weight": "consolidated-00047-of-00051.safetensors",
749
+ "layers.83.feed_forward.w1.weight": "consolidated-00048-of-00051.safetensors",
750
+ "layers.83.feed_forward.w2.weight": "consolidated-00048-of-00051.safetensors",
751
+ "layers.83.feed_forward.w3.weight": "consolidated-00048-of-00051.safetensors",
752
+ "layers.83.ffn_norm.weight": "consolidated-00048-of-00051.safetensors",
753
+ "layers.84.attention.wk.weight": "consolidated-00048-of-00051.safetensors",
754
+ "layers.84.attention.wo.weight": "consolidated-00048-of-00051.safetensors",
755
+ "layers.84.attention.wq.weight": "consolidated-00048-of-00051.safetensors",
756
+ "layers.84.attention.wv.weight": "consolidated-00048-of-00051.safetensors",
757
+ "layers.84.attention_norm.weight": "consolidated-00048-of-00051.safetensors",
758
+ "layers.84.feed_forward.w1.weight": "consolidated-00048-of-00051.safetensors",
759
+ "layers.84.feed_forward.w2.weight": "consolidated-00048-of-00051.safetensors",
760
+ "layers.84.feed_forward.w3.weight": "consolidated-00048-of-00051.safetensors",
761
+ "layers.84.ffn_norm.weight": "consolidated-00048-of-00051.safetensors",
762
+ "layers.85.attention.wk.weight": "consolidated-00048-of-00051.safetensors",
763
+ "layers.85.attention.wo.weight": "consolidated-00049-of-00051.safetensors",
764
+ "layers.85.attention.wq.weight": "consolidated-00049-of-00051.safetensors",
765
+ "layers.85.attention.wv.weight": "consolidated-00049-of-00051.safetensors",
766
+ "layers.85.attention_norm.weight": "consolidated-00049-of-00051.safetensors",
767
+ "layers.85.feed_forward.w1.weight": "consolidated-00049-of-00051.safetensors",
768
+ "layers.85.feed_forward.w2.weight": "consolidated-00049-of-00051.safetensors",
769
+ "layers.85.feed_forward.w3.weight": "consolidated-00049-of-00051.safetensors",
770
+ "layers.85.ffn_norm.weight": "consolidated-00049-of-00051.safetensors",
771
+ "layers.86.attention.wk.weight": "consolidated-00049-of-00051.safetensors",
772
+ "layers.86.attention.wo.weight": "consolidated-00049-of-00051.safetensors",
773
+ "layers.86.attention.wq.weight": "consolidated-00049-of-00051.safetensors",
774
+ "layers.86.attention.wv.weight": "consolidated-00049-of-00051.safetensors",
775
+ "layers.86.attention_norm.weight": "consolidated-00049-of-00051.safetensors",
776
+ "layers.86.feed_forward.w1.weight": "consolidated-00049-of-00051.safetensors",
777
+ "layers.86.feed_forward.w2.weight": "consolidated-00049-of-00051.safetensors",
778
+ "layers.86.feed_forward.w3.weight": "consolidated-00050-of-00051.safetensors",
779
+ "layers.86.ffn_norm.weight": "consolidated-00050-of-00051.safetensors",
780
+ "layers.87.attention.wk.weight": "consolidated-00050-of-00051.safetensors",
781
+ "layers.87.attention.wo.weight": "consolidated-00050-of-00051.safetensors",
782
+ "layers.87.attention.wq.weight": "consolidated-00050-of-00051.safetensors",
783
+ "layers.87.attention.wv.weight": "consolidated-00050-of-00051.safetensors",
784
+ "layers.87.attention_norm.weight": "consolidated-00050-of-00051.safetensors",
785
+ "layers.87.feed_forward.w1.weight": "consolidated-00050-of-00051.safetensors",
786
+ "layers.87.feed_forward.w2.weight": "consolidated-00050-of-00051.safetensors",
787
+ "layers.87.feed_forward.w3.weight": "consolidated-00050-of-00051.safetensors",
788
+ "layers.87.ffn_norm.weight": "consolidated-00050-of-00051.safetensors",
789
+ "layers.9.attention.wk.weight": "consolidated-00050-of-00051.safetensors",
790
+ "layers.9.attention.wo.weight": "consolidated-00050-of-00051.safetensors",
791
+ "layers.9.attention.wq.weight": "consolidated-00050-of-00051.safetensors",
792
+ "layers.9.attention.wv.weight": "consolidated-00050-of-00051.safetensors",
793
+ "layers.9.attention_norm.weight": "consolidated-00050-of-00051.safetensors",
794
+ "layers.9.feed_forward.w1.weight": "consolidated-00050-of-00051.safetensors",
795
+ "layers.9.feed_forward.w2.weight": "consolidated-00051-of-00051.safetensors",
796
+ "layers.9.feed_forward.w3.weight": "consolidated-00051-of-00051.safetensors",
797
+ "layers.9.ffn_norm.weight": "consolidated-00051-of-00051.safetensors",
798
+ "norm.weight": "consolidated-00051-of-00051.safetensors",
799
+ "output.weight": "consolidated-00051-of-00051.safetensors",
800
+ "tok_embeddings.weight": "consolidated-00051-of-00051.safetensors"
801
+ }
802
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.42.3"
6
+ }
measurement.json ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9b034ef5d9d85762599a98122ad0e5a947058dac60fa5a86d16fdb89cc936e9
3
+ size 8427846088
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39e4b5cdf3db812864629774d44b888ef1a1d5c3bf830135d2d6d458c20af802
3
+ size 8315496424
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ded54779a95e5782872f2112a55d6c64e38dceeaa5292740d5d84c095009020
3
+ size 8315496424
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:141296253f5b9e1c1b9e3dc5e0712637a68ba6a69362cd705141f61c277915af
3
+ size 6488395528
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
params.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dim": 12288,
3
+ "n_layers": 88,
4
+ "head_dim": 128,
5
+ "hidden_dim": 28672,
6
+ "n_heads": 96,
7
+ "n_kv_heads": 8,
8
+ "norm_eps": 1e-05,
9
+ "vocab_size": 32768,
10
+ "rope_theta": 1000000.0
11
+ }
quantization_config.json ADDED
The diff for this file is too large to render. See raw diff
 
test.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from typing import Dict
3
+
4
+ from safetensors.torch import load_file, save_file
5
+ from huggingface_hub import split_torch_state_dict_into_shards
6
+ import torch
7
+ import os
8
+
9
+ def save_state_dict(state_dict: Dict[str, torch.Tensor], save_directory: str):
10
+ state_dict_split = split_torch_state_dict_into_shards(state_dict, filename_pattern='consolidated{suffix}.safetensors')
11
+ for filename, tensors in state_dict_split.filename_to_tensors.items():
12
+ shard = {tensor: state_dict[tensor] for tensor in tensors}
13
+ print("Saving", save_directory, filename)
14
+ save_file(shard, os.path.join(save_directory, filename))
15
+ if state_dict_split.is_sharded:
16
+ index = {
17
+ "metadata": state_dict_split.metadata,
18
+ "weight_map": state_dict_split.tensor_to_filename,
19
+ }
20
+ with open(os.path.join(save_directory, "consolidated.safetensors.index.json"), "w") as f:
21
+ f.write(json.dumps(index, indent=2))
22
+
23
+ big_file = 'consolidated.safetensors'
24
+ loaded = load_file(big_file)
25
+
26
+ save_state_dict(loaded, save_directory=f'.')
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59f95e28944c062244741268596badc900df86c7f5ded05088d2da22a7379e06
3
+ size 587583
tokenizer.model.v3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59f95e28944c062244741268596badc900df86c7f5ded05088d2da22a7379e06
3
+ size 587583
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff