JunHowie commited on
Commit
970b079
·
verified ·
1 Parent(s): 64197b4

Upload folder using huggingface_hub

Browse files
.msc CHANGED
Binary files a/.msc and b/.msc differ
 
.mv CHANGED
@@ -1 +1 @@
1
- Revision:master,CreatedAt:1745983745
 
1
+ Revision:master,CreatedAt:1746673573
README.md CHANGED
@@ -1,354 +1,35 @@
1
- ---
2
- library_name: transformers
3
  license: apache-2.0
4
- license_link: https://huggingface.co/Qwen/Qwen3-4B/blob/main/LICENSE
5
- pipeline_tag: text-generation
6
- base_model:
7
- - Qwen/Qwen3-4B-Base
8
- ---
9
 
10
- # Qwen3-4B
11
- <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
12
- <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
13
- </a>
14
 
15
- ## Qwen3 Highlights
16
-
17
- Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
18
-
19
- - **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
20
- - **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
21
- - **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.
22
- - **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
23
- - **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**.
24
-
25
- ## Model Overview
26
-
27
- **Qwen3-4B** has the following features:
28
- - Type: Causal Language Models
29
- - Training Stage: Pretraining & Post-training
30
- - Number of Parameters: 4.0B
31
- - Number of Paramaters (Non-Embedding): 3.6B
32
- - Number of Layers: 36
33
- - Number of Attention Heads (GQA): 32 for Q and 8 for KV
34
- - Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts).
35
-
36
- For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
37
-
38
- > [!TIP]
39
- > If you encounter significant endless repetitions, please refer to the [Best Practices](#best-practices) section for optimal sampling parameters, and set the ``presence_penalty`` to 1.5.
40
-
41
- ## Quickstart
42
-
43
- The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
44
-
45
- With `transformers<4.51.0`, you will encounter the following error:
46
- ```
47
- KeyError: 'qwen3'
48
  ```
49
 
50
- The following contains a code snippet illustrating how to use the model generate content based on given inputs.
51
- ```python
52
- from transformers import AutoModelForCausalLM, AutoTokenizer
53
-
54
- model_name = "Qwen/Qwen3-4B"
55
-
56
- # load the tokenizer and the model
57
- tokenizer = AutoTokenizer.from_pretrained(model_name)
58
- model = AutoModelForCausalLM.from_pretrained(
59
- model_name,
60
- torch_dtype="auto",
61
- device_map="auto"
62
- )
63
-
64
- # prepare the model input
65
- prompt = "Give me a short introduction to large language model."
66
- messages = [
67
- {"role": "user", "content": prompt}
68
- ]
69
- text = tokenizer.apply_chat_template(
70
- messages,
71
- tokenize=False,
72
- add_generation_prompt=True,
73
- enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
74
- )
75
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
76
-
77
- # conduct text completion
78
- generated_ids = model.generate(
79
- **model_inputs,
80
- max_new_tokens=32768
81
- )
82
- output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
83
 
84
- # parsing thinking content
85
- try:
86
- # rindex finding 151668 (</think>)
87
- index = len(output_ids) - output_ids[::-1].index(151668)
88
- except ValueError:
89
- index = 0
90
-
91
- thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
92
- content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
93
-
94
- print("thinking content:", thinking_content)
95
- print("content:", content)
96
  ```
97
-
98
- For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
99
- - SGLang:
100
- ```shell
101
- python -m sglang.launch_server --model-path Qwen/Qwen3-4B --reasoning-parser qwen3
102
- ```
103
- - vLLM:
104
- ```shell
105
- vllm serve Qwen/Qwen3-4B --enable-reasoning --reasoning-parser deepseek_r1
106
- ```
107
-
108
- For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
109
-
110
- ## Switching Between Thinking and Non-Thinking Mode
111
-
112
- > [!TIP]
113
- > The `enable_thinking` switch is also available in APIs created by SGLang and vLLM.
114
- > Please refer to our documentation for [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) and [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) users.
115
-
116
- ### `enable_thinking=True`
117
-
118
- By default, Qwen3 has thinking capabilities enabled, similar to QwQ-32B. This means the model will use its reasoning abilities to enhance the quality of generated responses. For example, when explicitly setting `enable_thinking=True` or leaving it as the default value in `tokenizer.apply_chat_template`, the model will engage its thinking mode.
119
-
120
- ```python
121
- text = tokenizer.apply_chat_template(
122
- messages,
123
- tokenize=False,
124
- add_generation_prompt=True,
125
- enable_thinking=True # True is the default value for enable_thinking
126
- )
127
  ```
128
 
129
- In this mode, the model will generate think content wrapped in a `<think>...</think>` block, followed by the final response.
130
 
131
- > [!NOTE]
132
- > For thinking mode, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the [Best Practices](#best-practices) section.
133
-
134
-
135
- ### `enable_thinking=False`
136
-
137
- We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency.
138
-
139
- ```python
140
- text = tokenizer.apply_chat_template(
141
- messages,
142
- tokenize=False,
143
- add_generation_prompt=True,
144
- enable_thinking=False # Setting enable_thinking=False disables thinking mode
145
- )
146
  ```
147
-
148
- In this mode, the model will not generate any think content and will not include a `<think>...</think>` block.
149
-
150
- > [!NOTE]
151
- > For non-thinking mode, we suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`. For more detailed guidance, please refer to the [Best Practices](#best-practices) section.
152
-
153
- ### Advanced Usage: Switching Between Thinking and Non-Thinking Modes via User Input
154
-
155
- We provide a soft switch mechanism that allows users to dynamically control the model's behavior when `enable_thinking=True`. Specifically, you can add `/think` and `/no_think` to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.
156
-
157
- Here is an example of a multi-turn conversation:
158
-
159
  ```python
160
- from transformers import AutoModelForCausalLM, AutoTokenizer
161
-
162
- class QwenChatbot:
163
- def __init__(self, model_name="Qwen/Qwen3-4B"):
164
- self.tokenizer = AutoTokenizer.from_pretrained(model_name)
165
- self.model = AutoModelForCausalLM.from_pretrained(model_name)
166
- self.history = []
167
-
168
- def generate_response(self, user_input):
169
- messages = self.history + [{"role": "user", "content": user_input}]
170
-
171
- text = self.tokenizer.apply_chat_template(
172
- messages,
173
- tokenize=False,
174
- add_generation_prompt=True
175
- )
176
-
177
- inputs = self.tokenizer(text, return_tensors="pt")
178
- response_ids = self.model.generate(**inputs, max_new_tokens=32768)[0][len(inputs.input_ids[0]):].tolist()
179
- response = self.tokenizer.decode(response_ids, skip_special_tokens=True)
180
-
181
- # Update history
182
- self.history.append({"role": "user", "content": user_input})
183
- self.history.append({"role": "assistant", "content": response})
184
-
185
- return response
186
-
187
- # Example Usage
188
- if __name__ == "__main__":
189
- chatbot = QwenChatbot()
190
-
191
- # First input (without /think or /no_think tags, thinking mode is enabled by default)
192
- user_input_1 = "How many r's in strawberries?"
193
- print(f"User: {user_input_1}")
194
- response_1 = chatbot.generate_response(user_input_1)
195
- print(f"Bot: {response_1}")
196
- print("----------------------")
197
-
198
- # Second input with /no_think
199
- user_input_2 = "Then, how many r's in blueberries? /no_think"
200
- print(f"User: {user_input_2}")
201
- response_2 = chatbot.generate_response(user_input_2)
202
- print(f"Bot: {response_2}")
203
- print("----------------------")
204
-
205
- # Third input with /think
206
- user_input_3 = "Really? /think"
207
- print(f"User: {user_input_3}")
208
- response_3 = chatbot.generate_response(user_input_3)
209
- print(f"Bot: {response_3}")
210
  ```
211
-
212
- > [!NOTE]
213
- > For API compatibility, when `enable_thinking=True`, regardless of whether the user uses `/think` or `/no_think`, the model will always output a block wrapped in `<think>...</think>`. However, the content inside this block may be empty if thinking is disabled.
214
- > When `enable_thinking=False`, the soft switches are not valid. Regardless of any `/think` or `/no_think` tags input by the user, the model will not generate think content and will not include a `<think>...</think>` block.
215
-
216
- ## Agentic Use
217
-
218
- Qwen3 excels in tool calling capabilities. We recommend using [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity.
219
-
220
- To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
221
- ```python
222
- from qwen_agent.agents import Assistant
223
-
224
- # Define LLM
225
- llm_cfg = {
226
- 'model': 'Qwen3-4B',
227
-
228
- # Use the endpoint provided by Alibaba Model Studio:
229
- # 'model_type': 'qwen_dashscope',
230
- # 'api_key': os.getenv('DASHSCOPE_API_KEY'),
231
-
232
- # Use a custom endpoint compatible with OpenAI API:
233
- 'model_server': 'http://localhost:8000/v1', # api_base
234
- 'api_key': 'EMPTY',
235
-
236
- # Other parameters:
237
- # 'generate_cfg': {
238
- # # Add: When the response content is `<think>this is the thought</think>this is the answer;
239
- # # Do not add: When the response has been separated by reasoning_content and content.
240
- # 'thought_in_content': True,
241
- # },
242
- }
243
-
244
- # Define Tools
245
- tools = [
246
- {'mcpServers': { # You can specify the MCP configuration file
247
- 'time': {
248
- 'command': 'uvx',
249
- 'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai']
250
- },
251
- "fetch": {
252
- "command": "uvx",
253
- "args": ["mcp-server-fetch"]
254
- }
255
- }
256
- },
257
- 'code_interpreter', # Built-in tools
258
- ]
259
-
260
- # Define Agent
261
- bot = Assistant(llm=llm_cfg, function_list=tools)
262
-
263
- # Streaming generation
264
- messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}]
265
- for responses in bot.run(messages=messages):
266
- pass
267
- print(responses)
268
  ```
269
-
270
- ## Processing Long Texts
271
-
272
- Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the [YaRN](https://arxiv.org/abs/2309.00071) method.
273
-
274
- YaRN is currently supported by several inference frameworks, e.g., `transformers` and `llama.cpp` for local use, `vllm` and `sglang` for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:
275
-
276
- - Modifying the model files:
277
- In the `config.json` file, add the `rope_scaling` fields:
278
- ```json
279
- {
280
- ...,
281
- "rope_scaling": {
282
- "rope_type": "yarn",
283
- "factor": 4.0,
284
- "original_max_position_embeddings": 32768
285
- }
286
- }
287
- ```
288
- For `llama.cpp`, you need to regenerate the GGUF file after the modification.
289
-
290
- - Passing command line arguments:
291
-
292
- For `vllm`, you can use
293
- ```shell
294
- vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
295
- ```
296
-
297
- For `sglang`, you can use
298
- ```shell
299
- python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
300
- ```
301
-
302
- For `llama-server` from `llama.cpp`, you can use
303
- ```shell
304
- llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
305
- ```
306
-
307
- > [!IMPORTANT]
308
- > If you encounter the following warning
309
- > ```
310
- > Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'original_max_position_embeddings'}
311
- > ```
312
- > please upgrade `transformers>=4.51.0`.
313
-
314
- > [!NOTE]
315
- > All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts.**
316
- > We advise adding the `rope_scaling` configuration only when processing long contexts is required.
317
- > It is also recommended to modify the `factor` as needed. For example, if the typical context length for your application is 65,536 tokens, it would be better to set `factor` as 2.0.
318
-
319
- > [!NOTE]
320
- > The default `max_position_embeddings` in `config.json` is set to 40,960. This allocation includes reserving 32,768 tokens for outputs and 8,192 tokens for typical prompts, which is sufficient for most scenarios involving short text processing. If the average context length does not exceed 32,768 tokens, we do not recommend enabling YaRN in this scenario, as it may potentially degrade model performance.
321
-
322
- > [!TIP]
323
- > The endpoint provided by Alibaba Model Studio supports dynamic YaRN by default and no extra configuration is needed.
324
-
325
- ## Best Practices
326
-
327
- To achieve optimal performance, we recommend the following settings:
328
-
329
- 1. **Sampling Parameters**:
330
- - For thinking mode (`enable_thinking=True`), use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0`. **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions.
331
- - For non-thinking mode (`enable_thinking=False`), we suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`.
332
- - For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
333
-
334
- 2. **Adequate Output Length**: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 38,912 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance.
335
-
336
- 3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
337
- - **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
338
- - **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g., `"answer": "C"`."
339
-
340
- 4. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed.
341
-
342
- ### Citation
343
-
344
- If you find our work helpful, feel free to give us a cite.
345
-
346
  ```
347
- @misc{qwen3,
348
- title = {Qwen3},
349
- url = {https://qwenlm.github.io/blog/qwen3/},
350
- author = {Qwen Team},
351
- month = {April},
352
- year = {2025}
353
- }
354
- ```
 
 
 
1
  license: apache-2.0
 
 
 
 
 
2
 
3
+ # 通义千问Qwen3-30B-A3B-GPTQ-Int4量化
4
+ 基础模型 [通义千问3-30B-A3B](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B)
 
 
5
 
6
+ ### 最近更新
7
+ ```
8
+ 2025-05-08
9
+ fix (model.layers.*.mlp.gate) are not quantized
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ```
11
 
12
+ ### 依赖
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ```
15
+ vllm==0.8.5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ```
17
 
 
18
 
19
+ SDK下载
20
+ ```bash
21
+ #安装ModelScope
22
+ pip install modelscope
 
 
 
 
 
 
 
 
 
 
 
23
  ```
 
 
 
 
 
 
 
 
 
 
 
 
24
  ```python
25
+ #SDK模型下载
26
+ from modelscope import snapshot_download
27
+ model_dir = snapshot_download('JunHowie/Qwen3-30B-A3B-GPTQ-Int4')
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ```
29
+ Git下载
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ```
31
+ #Git模型下载
32
+ git clone https://www.modelscope.cn/JunHowie/Qwen3-30B-A3B-GPTQ-Int4.git
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ```
34
+
35
+ <p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
 
 
 
 
 
 
config.json CHANGED
@@ -31,15 +31,17 @@
31
  "group_size": 128,
32
  "lm_head": false,
33
  "meta": {
34
- "damp_auto_increment": 0.0025,
35
- "damp_percent": 0.01,
36
  "mse": 0.0,
37
  "quantizer": [
38
- "gptqmodel:2.2.0"
39
  ],
40
  "static_groups": false,
41
  "true_sequential": true,
42
- "uri": "https://github.com/modelcloud/gptqmodel"
 
 
43
  },
44
  "pack_dtype": "int32",
45
  "quant_method": "gptq",
 
31
  "group_size": 128,
32
  "lm_head": false,
33
  "meta": {
34
+ "damp_auto_increment": 0.01,
35
+ "damp_percent": 0.05,
36
  "mse": 0.0,
37
  "quantizer": [
38
+ "gptqmodel:4.0.0-dev"
39
  ],
40
  "static_groups": false,
41
  "true_sequential": true,
42
+ "uri": "https://github.com/modelcloud/gptqmodel",
43
+ "v2": false,
44
+ "v2_alpha": 0.25
45
  },
46
  "pack_dtype": "int32",
47
  "quant_method": "gptq",
model-00001-of-00005.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7e3faf1da5e6f718ead6172a721bb78716057b6aae09f60d4251d225580e0dc7
3
- size 4001671816
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15e5be8644a3e4c3ad1f05b48d6caf6c57d09772085dc9142a5560c4d2b7b242
3
+ size 4001615168
model-00002-of-00005.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fbf579aa1ceb1ecda8d7359f8ee278ccf24f8b5a72e9b0ff55f43ee291178674
3
- size 4002063104
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:621653d6bc683384afa8df31474c531a31012df22f9b33ebcb5cc1335dd9d070
3
+ size 4001632008
model-00003-of-00005.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ef5532d2e50ab7e732432ba20852b93e351feede7b51afe7ad16dc214e6822c5
3
- size 4002068288
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84eaf9be046f6dd2a8c73b8d806e3cb7c3b0f7f1eac788f9c7922aa8ef31a0bb
3
+ size 4001632136
model-00004-of-00005.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c0057666952e9e82931b9bce81bd28919025234b7154869ccc311f9b9806b519
3
- size 4001735352
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9103e328ced3707a04fd14fdacf59c12d58e512f8a1004310b342787ca52ae55
3
+ size 4001745272
model-00005-of-00005.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:87b1cd78b43ed9faba2d74d2e261c1a5350dd850e2e3b92fe4f7fbac8d1c7b1b
3
- size 925613664
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:504462bea741d0e85d5cfc85989bb2f20eac5671caca96d5fa537d39bbc94de3
3
+ size 908307360
model.safetensors.index.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "metadata": {
3
- "total_size": 16924176384
4
  },
5
  "weight_map": {
6
  "lm_head.weight": "model-00005-of-00005.safetensors",
@@ -1542,7 +1542,10 @@
1542
  "model.layers.0.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
1543
  "model.layers.0.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
1544
  "model.layers.0.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
1545
- "model.layers.0.mlp.gate.weight": "model-00001-of-00005.safetensors",
 
 
 
1546
  "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
1547
  "model.layers.0.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
1548
  "model.layers.0.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
@@ -3099,7 +3102,10 @@
3099
  "model.layers.1.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
3100
  "model.layers.1.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
3101
  "model.layers.1.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
3102
- "model.layers.1.mlp.gate.weight": "model-00001-of-00005.safetensors",
 
 
 
3103
  "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
3104
  "model.layers.1.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
3105
  "model.layers.1.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
@@ -3876,10 +3882,10 @@
3876
  "model.layers.10.mlp.experts.4.up_proj.qweight": "model-00001-of-00005.safetensors",
3877
  "model.layers.10.mlp.experts.4.up_proj.qzeros": "model-00001-of-00005.safetensors",
3878
  "model.layers.10.mlp.experts.4.up_proj.scales": "model-00001-of-00005.safetensors",
3879
- "model.layers.10.mlp.experts.40.down_proj.g_idx": "model-00002-of-00005.safetensors",
3880
- "model.layers.10.mlp.experts.40.down_proj.qweight": "model-00002-of-00005.safetensors",
3881
- "model.layers.10.mlp.experts.40.down_proj.qzeros": "model-00002-of-00005.safetensors",
3882
- "model.layers.10.mlp.experts.40.down_proj.scales": "model-00002-of-00005.safetensors",
3883
  "model.layers.10.mlp.experts.40.gate_proj.g_idx": "model-00001-of-00005.safetensors",
3884
  "model.layers.10.mlp.experts.40.gate_proj.qweight": "model-00001-of-00005.safetensors",
3885
  "model.layers.10.mlp.experts.40.gate_proj.qzeros": "model-00001-of-00005.safetensors",
@@ -3888,26 +3894,26 @@
3888
  "model.layers.10.mlp.experts.40.up_proj.qweight": "model-00001-of-00005.safetensors",
3889
  "model.layers.10.mlp.experts.40.up_proj.qzeros": "model-00001-of-00005.safetensors",
3890
  "model.layers.10.mlp.experts.40.up_proj.scales": "model-00001-of-00005.safetensors",
3891
- "model.layers.10.mlp.experts.41.down_proj.g_idx": "model-00002-of-00005.safetensors",
3892
- "model.layers.10.mlp.experts.41.down_proj.qweight": "model-00002-of-00005.safetensors",
3893
- "model.layers.10.mlp.experts.41.down_proj.qzeros": "model-00002-of-00005.safetensors",
3894
- "model.layers.10.mlp.experts.41.down_proj.scales": "model-00002-of-00005.safetensors",
3895
- "model.layers.10.mlp.experts.41.gate_proj.g_idx": "model-00002-of-00005.safetensors",
3896
- "model.layers.10.mlp.experts.41.gate_proj.qweight": "model-00002-of-00005.safetensors",
3897
- "model.layers.10.mlp.experts.41.gate_proj.qzeros": "model-00002-of-00005.safetensors",
3898
- "model.layers.10.mlp.experts.41.gate_proj.scales": "model-00002-of-00005.safetensors",
3899
- "model.layers.10.mlp.experts.41.up_proj.g_idx": "model-00002-of-00005.safetensors",
3900
- "model.layers.10.mlp.experts.41.up_proj.qweight": "model-00002-of-00005.safetensors",
3901
- "model.layers.10.mlp.experts.41.up_proj.qzeros": "model-00002-of-00005.safetensors",
3902
- "model.layers.10.mlp.experts.41.up_proj.scales": "model-00002-of-00005.safetensors",
3903
  "model.layers.10.mlp.experts.42.down_proj.g_idx": "model-00002-of-00005.safetensors",
3904
  "model.layers.10.mlp.experts.42.down_proj.qweight": "model-00002-of-00005.safetensors",
3905
  "model.layers.10.mlp.experts.42.down_proj.qzeros": "model-00002-of-00005.safetensors",
3906
  "model.layers.10.mlp.experts.42.down_proj.scales": "model-00002-of-00005.safetensors",
3907
- "model.layers.10.mlp.experts.42.gate_proj.g_idx": "model-00002-of-00005.safetensors",
3908
- "model.layers.10.mlp.experts.42.gate_proj.qweight": "model-00002-of-00005.safetensors",
3909
- "model.layers.10.mlp.experts.42.gate_proj.qzeros": "model-00002-of-00005.safetensors",
3910
- "model.layers.10.mlp.experts.42.gate_proj.scales": "model-00002-of-00005.safetensors",
3911
  "model.layers.10.mlp.experts.42.up_proj.g_idx": "model-00002-of-00005.safetensors",
3912
  "model.layers.10.mlp.experts.42.up_proj.qweight": "model-00002-of-00005.safetensors",
3913
  "model.layers.10.mlp.experts.42.up_proj.qzeros": "model-00002-of-00005.safetensors",
@@ -4656,7 +4662,10 @@
4656
  "model.layers.10.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
4657
  "model.layers.10.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
4658
  "model.layers.10.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
4659
- "model.layers.10.mlp.gate.weight": "model-00001-of-00005.safetensors",
 
 
 
4660
  "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
4661
  "model.layers.10.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
4662
  "model.layers.10.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
@@ -6213,7 +6222,10 @@
6213
  "model.layers.11.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
6214
  "model.layers.11.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
6215
  "model.layers.11.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
6216
- "model.layers.11.mlp.gate.weight": "model-00002-of-00005.safetensors",
 
 
 
6217
  "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
6218
  "model.layers.11.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
6219
  "model.layers.11.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
@@ -7770,7 +7782,10 @@
7770
  "model.layers.12.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
7771
  "model.layers.12.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
7772
  "model.layers.12.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
7773
- "model.layers.12.mlp.gate.weight": "model-00002-of-00005.safetensors",
 
 
 
7774
  "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
7775
  "model.layers.12.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
7776
  "model.layers.12.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
@@ -9327,7 +9342,10 @@
9327
  "model.layers.13.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
9328
  "model.layers.13.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
9329
  "model.layers.13.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
9330
- "model.layers.13.mlp.gate.weight": "model-00002-of-00005.safetensors",
 
 
 
9331
  "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
9332
  "model.layers.13.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
9333
  "model.layers.13.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
@@ -10884,7 +10902,10 @@
10884
  "model.layers.14.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
10885
  "model.layers.14.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
10886
  "model.layers.14.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
10887
- "model.layers.14.mlp.gate.weight": "model-00002-of-00005.safetensors",
 
 
 
10888
  "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
10889
  "model.layers.14.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
10890
  "model.layers.14.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
@@ -12441,7 +12462,10 @@
12441
  "model.layers.15.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
12442
  "model.layers.15.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
12443
  "model.layers.15.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
12444
- "model.layers.15.mlp.gate.weight": "model-00002-of-00005.safetensors",
 
 
 
12445
  "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
12446
  "model.layers.15.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
12447
  "model.layers.15.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
@@ -13998,7 +14022,10 @@
13998
  "model.layers.16.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
13999
  "model.layers.16.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
14000
  "model.layers.16.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
14001
- "model.layers.16.mlp.gate.weight": "model-00002-of-00005.safetensors",
 
 
 
14002
  "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
14003
  "model.layers.16.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
14004
  "model.layers.16.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
@@ -15555,7 +15582,10 @@
15555
  "model.layers.17.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
15556
  "model.layers.17.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
15557
  "model.layers.17.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
15558
- "model.layers.17.mlp.gate.weight": "model-00002-of-00005.safetensors",
 
 
 
15559
  "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
15560
  "model.layers.17.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
15561
  "model.layers.17.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
@@ -17112,7 +17142,10 @@
17112
  "model.layers.18.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
17113
  "model.layers.18.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
17114
  "model.layers.18.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
17115
- "model.layers.18.mlp.gate.weight": "model-00002-of-00005.safetensors",
 
 
 
17116
  "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
17117
  "model.layers.18.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
17118
  "model.layers.18.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
@@ -18669,7 +18702,10 @@
18669
  "model.layers.19.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
18670
  "model.layers.19.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
18671
  "model.layers.19.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
18672
- "model.layers.19.mlp.gate.weight": "model-00002-of-00005.safetensors",
 
 
 
18673
  "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
18674
  "model.layers.19.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
18675
  "model.layers.19.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
@@ -20226,7 +20262,10 @@
20226
  "model.layers.2.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
20227
  "model.layers.2.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
20228
  "model.layers.2.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
20229
- "model.layers.2.mlp.gate.weight": "model-00001-of-00005.safetensors",
 
 
 
20230
  "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
20231
  "model.layers.2.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
20232
  "model.layers.2.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
@@ -21783,7 +21822,10 @@
21783
  "model.layers.20.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
21784
  "model.layers.20.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
21785
  "model.layers.20.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
21786
- "model.layers.20.mlp.gate.weight": "model-00002-of-00005.safetensors",
 
 
 
21787
  "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
21788
  "model.layers.20.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
21789
  "model.layers.20.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
@@ -23340,7 +23382,10 @@
23340
  "model.layers.21.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
23341
  "model.layers.21.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
23342
  "model.layers.21.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
23343
- "model.layers.21.mlp.gate.weight": "model-00002-of-00005.safetensors",
 
 
 
23344
  "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
23345
  "model.layers.21.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
23346
  "model.layers.21.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
@@ -24549,50 +24594,50 @@
24549
  "model.layers.22.mlp.experts.72.up_proj.qweight": "model-00002-of-00005.safetensors",
24550
  "model.layers.22.mlp.experts.72.up_proj.qzeros": "model-00002-of-00005.safetensors",
24551
  "model.layers.22.mlp.experts.72.up_proj.scales": "model-00002-of-00005.safetensors",
24552
- "model.layers.22.mlp.experts.73.down_proj.g_idx": "model-00003-of-00005.safetensors",
24553
- "model.layers.22.mlp.experts.73.down_proj.qweight": "model-00003-of-00005.safetensors",
24554
- "model.layers.22.mlp.experts.73.down_proj.qzeros": "model-00003-of-00005.safetensors",
24555
- "model.layers.22.mlp.experts.73.down_proj.scales": "model-00003-of-00005.safetensors",
24556
- "model.layers.22.mlp.experts.73.gate_proj.g_idx": "model-00003-of-00005.safetensors",
24557
- "model.layers.22.mlp.experts.73.gate_proj.qweight": "model-00003-of-00005.safetensors",
24558
- "model.layers.22.mlp.experts.73.gate_proj.qzeros": "model-00003-of-00005.safetensors",
24559
- "model.layers.22.mlp.experts.73.gate_proj.scales": "model-00003-of-00005.safetensors",
24560
- "model.layers.22.mlp.experts.73.up_proj.g_idx": "model-00003-of-00005.safetensors",
24561
- "model.layers.22.mlp.experts.73.up_proj.qweight": "model-00003-of-00005.safetensors",
24562
- "model.layers.22.mlp.experts.73.up_proj.qzeros": "model-00003-of-00005.safetensors",
24563
- "model.layers.22.mlp.experts.73.up_proj.scales": "model-00003-of-00005.safetensors",
24564
- "model.layers.22.mlp.experts.74.down_proj.g_idx": "model-00003-of-00005.safetensors",
24565
- "model.layers.22.mlp.experts.74.down_proj.qweight": "model-00003-of-00005.safetensors",
24566
- "model.layers.22.mlp.experts.74.down_proj.qzeros": "model-00003-of-00005.safetensors",
24567
- "model.layers.22.mlp.experts.74.down_proj.scales": "model-00003-of-00005.safetensors",
24568
- "model.layers.22.mlp.experts.74.gate_proj.g_idx": "model-00003-of-00005.safetensors",
24569
- "model.layers.22.mlp.experts.74.gate_proj.qweight": "model-00003-of-00005.safetensors",
24570
- "model.layers.22.mlp.experts.74.gate_proj.qzeros": "model-00003-of-00005.safetensors",
24571
- "model.layers.22.mlp.experts.74.gate_proj.scales": "model-00003-of-00005.safetensors",
24572
- "model.layers.22.mlp.experts.74.up_proj.g_idx": "model-00003-of-00005.safetensors",
24573
- "model.layers.22.mlp.experts.74.up_proj.qweight": "model-00003-of-00005.safetensors",
24574
- "model.layers.22.mlp.experts.74.up_proj.qzeros": "model-00003-of-00005.safetensors",
24575
- "model.layers.22.mlp.experts.74.up_proj.scales": "model-00003-of-00005.safetensors",
24576
- "model.layers.22.mlp.experts.75.down_proj.g_idx": "model-00003-of-00005.safetensors",
24577
- "model.layers.22.mlp.experts.75.down_proj.qweight": "model-00003-of-00005.safetensors",
24578
- "model.layers.22.mlp.experts.75.down_proj.qzeros": "model-00003-of-00005.safetensors",
24579
- "model.layers.22.mlp.experts.75.down_proj.scales": "model-00003-of-00005.safetensors",
24580
- "model.layers.22.mlp.experts.75.gate_proj.g_idx": "model-00003-of-00005.safetensors",
24581
- "model.layers.22.mlp.experts.75.gate_proj.qweight": "model-00003-of-00005.safetensors",
24582
- "model.layers.22.mlp.experts.75.gate_proj.qzeros": "model-00003-of-00005.safetensors",
24583
- "model.layers.22.mlp.experts.75.gate_proj.scales": "model-00003-of-00005.safetensors",
24584
- "model.layers.22.mlp.experts.75.up_proj.g_idx": "model-00003-of-00005.safetensors",
24585
- "model.layers.22.mlp.experts.75.up_proj.qweight": "model-00003-of-00005.safetensors",
24586
- "model.layers.22.mlp.experts.75.up_proj.qzeros": "model-00003-of-00005.safetensors",
24587
- "model.layers.22.mlp.experts.75.up_proj.scales": "model-00003-of-00005.safetensors",
24588
  "model.layers.22.mlp.experts.76.down_proj.g_idx": "model-00003-of-00005.safetensors",
24589
  "model.layers.22.mlp.experts.76.down_proj.qweight": "model-00003-of-00005.safetensors",
24590
  "model.layers.22.mlp.experts.76.down_proj.qzeros": "model-00003-of-00005.safetensors",
24591
  "model.layers.22.mlp.experts.76.down_proj.scales": "model-00003-of-00005.safetensors",
24592
- "model.layers.22.mlp.experts.76.gate_proj.g_idx": "model-00003-of-00005.safetensors",
24593
- "model.layers.22.mlp.experts.76.gate_proj.qweight": "model-00003-of-00005.safetensors",
24594
- "model.layers.22.mlp.experts.76.gate_proj.qzeros": "model-00003-of-00005.safetensors",
24595
- "model.layers.22.mlp.experts.76.gate_proj.scales": "model-00003-of-00005.safetensors",
24596
  "model.layers.22.mlp.experts.76.up_proj.g_idx": "model-00003-of-00005.safetensors",
24597
  "model.layers.22.mlp.experts.76.up_proj.qweight": "model-00003-of-00005.safetensors",
24598
  "model.layers.22.mlp.experts.76.up_proj.qzeros": "model-00003-of-00005.safetensors",
@@ -24897,7 +24942,10 @@
24897
  "model.layers.22.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
24898
  "model.layers.22.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
24899
  "model.layers.22.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
24900
- "model.layers.22.mlp.gate.weight": "model-00002-of-00005.safetensors",
 
 
 
24901
  "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
24902
  "model.layers.22.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
24903
  "model.layers.22.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
@@ -26454,7 +26502,10 @@
26454
  "model.layers.23.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
26455
  "model.layers.23.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
26456
  "model.layers.23.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
26457
- "model.layers.23.mlp.gate.weight": "model-00003-of-00005.safetensors",
 
 
 
26458
  "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
26459
  "model.layers.23.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
26460
  "model.layers.23.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
@@ -28011,7 +28062,10 @@
28011
  "model.layers.24.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
28012
  "model.layers.24.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
28013
  "model.layers.24.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
28014
- "model.layers.24.mlp.gate.weight": "model-00003-of-00005.safetensors",
 
 
 
28015
  "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
28016
  "model.layers.24.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
28017
  "model.layers.24.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
@@ -29568,7 +29622,10 @@
29568
  "model.layers.25.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
29569
  "model.layers.25.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
29570
  "model.layers.25.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
29571
- "model.layers.25.mlp.gate.weight": "model-00003-of-00005.safetensors",
 
 
 
29572
  "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
29573
  "model.layers.25.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
29574
  "model.layers.25.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
@@ -31125,7 +31182,10 @@
31125
  "model.layers.26.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
31126
  "model.layers.26.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
31127
  "model.layers.26.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
31128
- "model.layers.26.mlp.gate.weight": "model-00003-of-00005.safetensors",
 
 
 
31129
  "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
31130
  "model.layers.26.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
31131
  "model.layers.26.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
@@ -32682,7 +32742,10 @@
32682
  "model.layers.27.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
32683
  "model.layers.27.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
32684
  "model.layers.27.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
32685
- "model.layers.27.mlp.gate.weight": "model-00003-of-00005.safetensors",
 
 
 
32686
  "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
32687
  "model.layers.27.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
32688
  "model.layers.27.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
@@ -34239,7 +34302,10 @@
34239
  "model.layers.28.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
34240
  "model.layers.28.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
34241
  "model.layers.28.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
34242
- "model.layers.28.mlp.gate.weight": "model-00003-of-00005.safetensors",
 
 
 
34243
  "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
34244
  "model.layers.28.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
34245
  "model.layers.28.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
@@ -35796,7 +35862,10 @@
35796
  "model.layers.29.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
35797
  "model.layers.29.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
35798
  "model.layers.29.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
35799
- "model.layers.29.mlp.gate.weight": "model-00003-of-00005.safetensors",
 
 
 
35800
  "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
35801
  "model.layers.29.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
35802
  "model.layers.29.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
@@ -37353,7 +37422,10 @@
37353
  "model.layers.3.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
37354
  "model.layers.3.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
37355
  "model.layers.3.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
37356
- "model.layers.3.mlp.gate.weight": "model-00001-of-00005.safetensors",
 
 
 
37357
  "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
37358
  "model.layers.3.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
37359
  "model.layers.3.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
@@ -38910,7 +38982,10 @@
38910
  "model.layers.30.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
38911
  "model.layers.30.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
38912
  "model.layers.30.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
38913
- "model.layers.30.mlp.gate.weight": "model-00003-of-00005.safetensors",
 
 
 
38914
  "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
38915
  "model.layers.30.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
38916
  "model.layers.30.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
@@ -40467,7 +40542,10 @@
40467
  "model.layers.31.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
40468
  "model.layers.31.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
40469
  "model.layers.31.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
40470
- "model.layers.31.mlp.gate.weight": "model-00003-of-00005.safetensors",
 
 
 
40471
  "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
40472
  "model.layers.31.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
40473
  "model.layers.31.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
@@ -42024,7 +42102,10 @@
42024
  "model.layers.32.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
42025
  "model.layers.32.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
42026
  "model.layers.32.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
42027
- "model.layers.32.mlp.gate.weight": "model-00003-of-00005.safetensors",
 
 
 
42028
  "model.layers.32.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
42029
  "model.layers.32.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
42030
  "model.layers.32.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
@@ -43581,7 +43662,10 @@
43581
  "model.layers.33.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
43582
  "model.layers.33.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
43583
  "model.layers.33.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
43584
- "model.layers.33.mlp.gate.weight": "model-00003-of-00005.safetensors",
 
 
 
43585
  "model.layers.33.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
43586
  "model.layers.33.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
43587
  "model.layers.33.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
@@ -43698,66 +43782,66 @@
43698
  "model.layers.34.mlp.experts.104.up_proj.qweight": "model-00003-of-00005.safetensors",
43699
  "model.layers.34.mlp.experts.104.up_proj.qzeros": "model-00003-of-00005.safetensors",
43700
  "model.layers.34.mlp.experts.104.up_proj.scales": "model-00003-of-00005.safetensors",
43701
- "model.layers.34.mlp.experts.105.down_proj.g_idx": "model-00004-of-00005.safetensors",
43702
- "model.layers.34.mlp.experts.105.down_proj.qweight": "model-00004-of-00005.safetensors",
43703
- "model.layers.34.mlp.experts.105.down_proj.qzeros": "model-00004-of-00005.safetensors",
43704
- "model.layers.34.mlp.experts.105.down_proj.scales": "model-00004-of-00005.safetensors",
43705
  "model.layers.34.mlp.experts.105.gate_proj.g_idx": "model-00003-of-00005.safetensors",
43706
  "model.layers.34.mlp.experts.105.gate_proj.qweight": "model-00003-of-00005.safetensors",
43707
  "model.layers.34.mlp.experts.105.gate_proj.qzeros": "model-00003-of-00005.safetensors",
43708
  "model.layers.34.mlp.experts.105.gate_proj.scales": "model-00003-of-00005.safetensors",
43709
- "model.layers.34.mlp.experts.105.up_proj.g_idx": "model-00004-of-00005.safetensors",
43710
- "model.layers.34.mlp.experts.105.up_proj.qweight": "model-00004-of-00005.safetensors",
43711
- "model.layers.34.mlp.experts.105.up_proj.qzeros": "model-00004-of-00005.safetensors",
43712
- "model.layers.34.mlp.experts.105.up_proj.scales": "model-00004-of-00005.safetensors",
43713
- "model.layers.34.mlp.experts.106.down_proj.g_idx": "model-00004-of-00005.safetensors",
43714
- "model.layers.34.mlp.experts.106.down_proj.qweight": "model-00004-of-00005.safetensors",
43715
- "model.layers.34.mlp.experts.106.down_proj.qzeros": "model-00004-of-00005.safetensors",
43716
- "model.layers.34.mlp.experts.106.down_proj.scales": "model-00004-of-00005.safetensors",
43717
- "model.layers.34.mlp.experts.106.gate_proj.g_idx": "model-00004-of-00005.safetensors",
43718
- "model.layers.34.mlp.experts.106.gate_proj.qweight": "model-00004-of-00005.safetensors",
43719
- "model.layers.34.mlp.experts.106.gate_proj.qzeros": "model-00004-of-00005.safetensors",
43720
- "model.layers.34.mlp.experts.106.gate_proj.scales": "model-00004-of-00005.safetensors",
43721
- "model.layers.34.mlp.experts.106.up_proj.g_idx": "model-00004-of-00005.safetensors",
43722
- "model.layers.34.mlp.experts.106.up_proj.qweight": "model-00004-of-00005.safetensors",
43723
- "model.layers.34.mlp.experts.106.up_proj.qzeros": "model-00004-of-00005.safetensors",
43724
- "model.layers.34.mlp.experts.106.up_proj.scales": "model-00004-of-00005.safetensors",
43725
- "model.layers.34.mlp.experts.107.down_proj.g_idx": "model-00004-of-00005.safetensors",
43726
- "model.layers.34.mlp.experts.107.down_proj.qweight": "model-00004-of-00005.safetensors",
43727
- "model.layers.34.mlp.experts.107.down_proj.qzeros": "model-00004-of-00005.safetensors",
43728
- "model.layers.34.mlp.experts.107.down_proj.scales": "model-00004-of-00005.safetensors",
43729
- "model.layers.34.mlp.experts.107.gate_proj.g_idx": "model-00004-of-00005.safetensors",
43730
- "model.layers.34.mlp.experts.107.gate_proj.qweight": "model-00004-of-00005.safetensors",
43731
- "model.layers.34.mlp.experts.107.gate_proj.qzeros": "model-00004-of-00005.safetensors",
43732
- "model.layers.34.mlp.experts.107.gate_proj.scales": "model-00004-of-00005.safetensors",
43733
- "model.layers.34.mlp.experts.107.up_proj.g_idx": "model-00004-of-00005.safetensors",
43734
- "model.layers.34.mlp.experts.107.up_proj.qweight": "model-00004-of-00005.safetensors",
43735
- "model.layers.34.mlp.experts.107.up_proj.qzeros": "model-00004-of-00005.safetensors",
43736
- "model.layers.34.mlp.experts.107.up_proj.scales": "model-00004-of-00005.safetensors",
43737
- "model.layers.34.mlp.experts.108.down_proj.g_idx": "model-00004-of-00005.safetensors",
43738
- "model.layers.34.mlp.experts.108.down_proj.qweight": "model-00004-of-00005.safetensors",
43739
- "model.layers.34.mlp.experts.108.down_proj.qzeros": "model-00004-of-00005.safetensors",
43740
- "model.layers.34.mlp.experts.108.down_proj.scales": "model-00004-of-00005.safetensors",
43741
- "model.layers.34.mlp.experts.108.gate_proj.g_idx": "model-00004-of-00005.safetensors",
43742
- "model.layers.34.mlp.experts.108.gate_proj.qweight": "model-00004-of-00005.safetensors",
43743
- "model.layers.34.mlp.experts.108.gate_proj.qzeros": "model-00004-of-00005.safetensors",
43744
- "model.layers.34.mlp.experts.108.gate_proj.scales": "model-00004-of-00005.safetensors",
43745
- "model.layers.34.mlp.experts.108.up_proj.g_idx": "model-00004-of-00005.safetensors",
43746
- "model.layers.34.mlp.experts.108.up_proj.qweight": "model-00004-of-00005.safetensors",
43747
- "model.layers.34.mlp.experts.108.up_proj.qzeros": "model-00004-of-00005.safetensors",
43748
- "model.layers.34.mlp.experts.108.up_proj.scales": "model-00004-of-00005.safetensors",
43749
- "model.layers.34.mlp.experts.109.down_proj.g_idx": "model-00004-of-00005.safetensors",
43750
- "model.layers.34.mlp.experts.109.down_proj.qweight": "model-00004-of-00005.safetensors",
43751
- "model.layers.34.mlp.experts.109.down_proj.qzeros": "model-00004-of-00005.safetensors",
43752
- "model.layers.34.mlp.experts.109.down_proj.scales": "model-00004-of-00005.safetensors",
43753
- "model.layers.34.mlp.experts.109.gate_proj.g_idx": "model-00004-of-00005.safetensors",
43754
- "model.layers.34.mlp.experts.109.gate_proj.qweight": "model-00004-of-00005.safetensors",
43755
- "model.layers.34.mlp.experts.109.gate_proj.qzeros": "model-00004-of-00005.safetensors",
43756
- "model.layers.34.mlp.experts.109.gate_proj.scales": "model-00004-of-00005.safetensors",
43757
- "model.layers.34.mlp.experts.109.up_proj.g_idx": "model-00004-of-00005.safetensors",
43758
- "model.layers.34.mlp.experts.109.up_proj.qweight": "model-00004-of-00005.safetensors",
43759
- "model.layers.34.mlp.experts.109.up_proj.qzeros": "model-00004-of-00005.safetensors",
43760
- "model.layers.34.mlp.experts.109.up_proj.scales": "model-00004-of-00005.safetensors",
43761
  "model.layers.34.mlp.experts.11.down_proj.g_idx": "model-00003-of-00005.safetensors",
43762
  "model.layers.34.mlp.experts.11.down_proj.qweight": "model-00003-of-00005.safetensors",
43763
  "model.layers.34.mlp.experts.11.down_proj.qzeros": "model-00003-of-00005.safetensors",
@@ -43774,10 +43858,10 @@
43774
  "model.layers.34.mlp.experts.110.down_proj.qweight": "model-00004-of-00005.safetensors",
43775
  "model.layers.34.mlp.experts.110.down_proj.qzeros": "model-00004-of-00005.safetensors",
43776
  "model.layers.34.mlp.experts.110.down_proj.scales": "model-00004-of-00005.safetensors",
43777
- "model.layers.34.mlp.experts.110.gate_proj.g_idx": "model-00004-of-00005.safetensors",
43778
- "model.layers.34.mlp.experts.110.gate_proj.qweight": "model-00004-of-00005.safetensors",
43779
- "model.layers.34.mlp.experts.110.gate_proj.qzeros": "model-00004-of-00005.safetensors",
43780
- "model.layers.34.mlp.experts.110.gate_proj.scales": "model-00004-of-00005.safetensors",
43781
  "model.layers.34.mlp.experts.110.up_proj.g_idx": "model-00004-of-00005.safetensors",
43782
  "model.layers.34.mlp.experts.110.up_proj.qweight": "model-00004-of-00005.safetensors",
43783
  "model.layers.34.mlp.experts.110.up_proj.qzeros": "model-00004-of-00005.safetensors",
@@ -45138,7 +45222,10 @@
45138
  "model.layers.34.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
45139
  "model.layers.34.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
45140
  "model.layers.34.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
45141
- "model.layers.34.mlp.gate.weight": "model-00003-of-00005.safetensors",
 
 
 
45142
  "model.layers.34.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
45143
  "model.layers.34.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
45144
  "model.layers.34.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
@@ -46695,7 +46782,10 @@
46695
  "model.layers.35.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
46696
  "model.layers.35.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
46697
  "model.layers.35.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
46698
- "model.layers.35.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
46699
  "model.layers.35.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
46700
  "model.layers.35.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
46701
  "model.layers.35.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -48252,7 +48342,10 @@
48252
  "model.layers.36.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
48253
  "model.layers.36.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
48254
  "model.layers.36.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
48255
- "model.layers.36.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
48256
  "model.layers.36.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
48257
  "model.layers.36.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
48258
  "model.layers.36.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -49809,7 +49902,10 @@
49809
  "model.layers.37.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
49810
  "model.layers.37.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
49811
  "model.layers.37.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
49812
- "model.layers.37.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
49813
  "model.layers.37.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
49814
  "model.layers.37.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
49815
  "model.layers.37.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -51366,7 +51462,10 @@
51366
  "model.layers.38.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
51367
  "model.layers.38.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
51368
  "model.layers.38.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
51369
- "model.layers.38.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
51370
  "model.layers.38.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
51371
  "model.layers.38.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
51372
  "model.layers.38.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -52923,7 +53022,10 @@
52923
  "model.layers.39.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
52924
  "model.layers.39.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
52925
  "model.layers.39.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
52926
- "model.layers.39.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
52927
  "model.layers.39.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
52928
  "model.layers.39.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
52929
  "model.layers.39.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -54480,7 +54582,10 @@
54480
  "model.layers.4.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
54481
  "model.layers.4.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
54482
  "model.layers.4.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
54483
- "model.layers.4.mlp.gate.weight": "model-00001-of-00005.safetensors",
 
 
 
54484
  "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
54485
  "model.layers.4.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
54486
  "model.layers.4.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
@@ -56037,7 +56142,10 @@
56037
  "model.layers.40.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
56038
  "model.layers.40.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
56039
  "model.layers.40.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
56040
- "model.layers.40.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
56041
  "model.layers.40.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
56042
  "model.layers.40.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
56043
  "model.layers.40.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -57594,7 +57702,10 @@
57594
  "model.layers.41.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
57595
  "model.layers.41.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
57596
  "model.layers.41.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
57597
- "model.layers.41.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
57598
  "model.layers.41.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
57599
  "model.layers.41.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
57600
  "model.layers.41.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -59151,7 +59262,10 @@
59151
  "model.layers.42.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
59152
  "model.layers.42.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
59153
  "model.layers.42.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
59154
- "model.layers.42.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
59155
  "model.layers.42.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
59156
  "model.layers.42.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
59157
  "model.layers.42.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -60708,7 +60822,10 @@
60708
  "model.layers.43.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
60709
  "model.layers.43.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
60710
  "model.layers.43.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
60711
- "model.layers.43.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
60712
  "model.layers.43.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
60713
  "model.layers.43.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
60714
  "model.layers.43.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -62265,7 +62382,10 @@
62265
  "model.layers.44.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
62266
  "model.layers.44.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
62267
  "model.layers.44.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
62268
- "model.layers.44.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
62269
  "model.layers.44.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
62270
  "model.layers.44.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
62271
  "model.layers.44.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -63822,7 +63942,10 @@
63822
  "model.layers.45.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
63823
  "model.layers.45.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
63824
  "model.layers.45.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
63825
- "model.layers.45.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
63826
  "model.layers.45.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
63827
  "model.layers.45.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
63828
  "model.layers.45.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -65379,7 +65502,10 @@
65379
  "model.layers.46.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
65380
  "model.layers.46.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
65381
  "model.layers.46.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
65382
- "model.layers.46.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
65383
  "model.layers.46.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
65384
  "model.layers.46.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
65385
  "model.layers.46.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -65424,18 +65550,18 @@
65424
  "model.layers.47.mlp.experts.1.up_proj.qweight": "model-00004-of-00005.safetensors",
65425
  "model.layers.47.mlp.experts.1.up_proj.qzeros": "model-00004-of-00005.safetensors",
65426
  "model.layers.47.mlp.experts.1.up_proj.scales": "model-00004-of-00005.safetensors",
65427
- "model.layers.47.mlp.experts.10.down_proj.g_idx": "model-00005-of-00005.safetensors",
65428
- "model.layers.47.mlp.experts.10.down_proj.qweight": "model-00005-of-00005.safetensors",
65429
- "model.layers.47.mlp.experts.10.down_proj.qzeros": "model-00005-of-00005.safetensors",
65430
- "model.layers.47.mlp.experts.10.down_proj.scales": "model-00005-of-00005.safetensors",
65431
- "model.layers.47.mlp.experts.10.gate_proj.g_idx": "model-00005-of-00005.safetensors",
65432
- "model.layers.47.mlp.experts.10.gate_proj.qweight": "model-00005-of-00005.safetensors",
65433
- "model.layers.47.mlp.experts.10.gate_proj.qzeros": "model-00005-of-00005.safetensors",
65434
- "model.layers.47.mlp.experts.10.gate_proj.scales": "model-00005-of-00005.safetensors",
65435
- "model.layers.47.mlp.experts.10.up_proj.g_idx": "model-00005-of-00005.safetensors",
65436
- "model.layers.47.mlp.experts.10.up_proj.qweight": "model-00005-of-00005.safetensors",
65437
- "model.layers.47.mlp.experts.10.up_proj.qzeros": "model-00005-of-00005.safetensors",
65438
- "model.layers.47.mlp.experts.10.up_proj.scales": "model-00005-of-00005.safetensors",
65439
  "model.layers.47.mlp.experts.100.down_proj.g_idx": "model-00005-of-00005.safetensors",
65440
  "model.layers.47.mlp.experts.100.down_proj.qweight": "model-00005-of-00005.safetensors",
65441
  "model.layers.47.mlp.experts.100.down_proj.qzeros": "model-00005-of-00005.safetensors",
@@ -65556,18 +65682,18 @@
65556
  "model.layers.47.mlp.experts.109.up_proj.qweight": "model-00005-of-00005.safetensors",
65557
  "model.layers.47.mlp.experts.109.up_proj.qzeros": "model-00005-of-00005.safetensors",
65558
  "model.layers.47.mlp.experts.109.up_proj.scales": "model-00005-of-00005.safetensors",
65559
- "model.layers.47.mlp.experts.11.down_proj.g_idx": "model-00005-of-00005.safetensors",
65560
- "model.layers.47.mlp.experts.11.down_proj.qweight": "model-00005-of-00005.safetensors",
65561
- "model.layers.47.mlp.experts.11.down_proj.qzeros": "model-00005-of-00005.safetensors",
65562
- "model.layers.47.mlp.experts.11.down_proj.scales": "model-00005-of-00005.safetensors",
65563
- "model.layers.47.mlp.experts.11.gate_proj.g_idx": "model-00005-of-00005.safetensors",
65564
- "model.layers.47.mlp.experts.11.gate_proj.qweight": "model-00005-of-00005.safetensors",
65565
- "model.layers.47.mlp.experts.11.gate_proj.qzeros": "model-00005-of-00005.safetensors",
65566
- "model.layers.47.mlp.experts.11.gate_proj.scales": "model-00005-of-00005.safetensors",
65567
- "model.layers.47.mlp.experts.11.up_proj.g_idx": "model-00005-of-00005.safetensors",
65568
- "model.layers.47.mlp.experts.11.up_proj.qweight": "model-00005-of-00005.safetensors",
65569
- "model.layers.47.mlp.experts.11.up_proj.qzeros": "model-00005-of-00005.safetensors",
65570
- "model.layers.47.mlp.experts.11.up_proj.scales": "model-00005-of-00005.safetensors",
65571
  "model.layers.47.mlp.experts.110.down_proj.g_idx": "model-00005-of-00005.safetensors",
65572
  "model.layers.47.mlp.experts.110.down_proj.qweight": "model-00005-of-00005.safetensors",
65573
  "model.layers.47.mlp.experts.110.down_proj.qzeros": "model-00005-of-00005.safetensors",
@@ -65692,10 +65818,10 @@
65692
  "model.layers.47.mlp.experts.12.down_proj.qweight": "model-00005-of-00005.safetensors",
65693
  "model.layers.47.mlp.experts.12.down_proj.qzeros": "model-00005-of-00005.safetensors",
65694
  "model.layers.47.mlp.experts.12.down_proj.scales": "model-00005-of-00005.safetensors",
65695
- "model.layers.47.mlp.experts.12.gate_proj.g_idx": "model-00005-of-00005.safetensors",
65696
- "model.layers.47.mlp.experts.12.gate_proj.qweight": "model-00005-of-00005.safetensors",
65697
- "model.layers.47.mlp.experts.12.gate_proj.qzeros": "model-00005-of-00005.safetensors",
65698
- "model.layers.47.mlp.experts.12.gate_proj.scales": "model-00005-of-00005.safetensors",
65699
  "model.layers.47.mlp.experts.12.up_proj.g_idx": "model-00005-of-00005.safetensors",
65700
  "model.layers.47.mlp.experts.12.up_proj.qweight": "model-00005-of-00005.safetensors",
65701
  "model.layers.47.mlp.experts.12.up_proj.qzeros": "model-00005-of-00005.safetensors",
@@ -66276,18 +66402,18 @@
66276
  "model.layers.47.mlp.experts.49.up_proj.qweight": "model-00005-of-00005.safetensors",
66277
  "model.layers.47.mlp.experts.49.up_proj.qzeros": "model-00005-of-00005.safetensors",
66278
  "model.layers.47.mlp.experts.49.up_proj.scales": "model-00005-of-00005.safetensors",
66279
- "model.layers.47.mlp.experts.5.down_proj.g_idx": "model-00005-of-00005.safetensors",
66280
- "model.layers.47.mlp.experts.5.down_proj.qweight": "model-00005-of-00005.safetensors",
66281
- "model.layers.47.mlp.experts.5.down_proj.qzeros": "model-00005-of-00005.safetensors",
66282
- "model.layers.47.mlp.experts.5.down_proj.scales": "model-00005-of-00005.safetensors",
66283
  "model.layers.47.mlp.experts.5.gate_proj.g_idx": "model-00004-of-00005.safetensors",
66284
  "model.layers.47.mlp.experts.5.gate_proj.qweight": "model-00004-of-00005.safetensors",
66285
  "model.layers.47.mlp.experts.5.gate_proj.qzeros": "model-00004-of-00005.safetensors",
66286
  "model.layers.47.mlp.experts.5.gate_proj.scales": "model-00004-of-00005.safetensors",
66287
- "model.layers.47.mlp.experts.5.up_proj.g_idx": "model-00005-of-00005.safetensors",
66288
- "model.layers.47.mlp.experts.5.up_proj.qweight": "model-00005-of-00005.safetensors",
66289
- "model.layers.47.mlp.experts.5.up_proj.qzeros": "model-00005-of-00005.safetensors",
66290
- "model.layers.47.mlp.experts.5.up_proj.scales": "model-00005-of-00005.safetensors",
66291
  "model.layers.47.mlp.experts.50.down_proj.g_idx": "model-00005-of-00005.safetensors",
66292
  "model.layers.47.mlp.experts.50.down_proj.qweight": "model-00005-of-00005.safetensors",
66293
  "model.layers.47.mlp.experts.50.down_proj.qzeros": "model-00005-of-00005.safetensors",
@@ -66408,18 +66534,18 @@
66408
  "model.layers.47.mlp.experts.59.up_proj.qweight": "model-00005-of-00005.safetensors",
66409
  "model.layers.47.mlp.experts.59.up_proj.qzeros": "model-00005-of-00005.safetensors",
66410
  "model.layers.47.mlp.experts.59.up_proj.scales": "model-00005-of-00005.safetensors",
66411
- "model.layers.47.mlp.experts.6.down_proj.g_idx": "model-00005-of-00005.safetensors",
66412
- "model.layers.47.mlp.experts.6.down_proj.qweight": "model-00005-of-00005.safetensors",
66413
- "model.layers.47.mlp.experts.6.down_proj.qzeros": "model-00005-of-00005.safetensors",
66414
- "model.layers.47.mlp.experts.6.down_proj.scales": "model-00005-of-00005.safetensors",
66415
- "model.layers.47.mlp.experts.6.gate_proj.g_idx": "model-00005-of-00005.safetensors",
66416
- "model.layers.47.mlp.experts.6.gate_proj.qweight": "model-00005-of-00005.safetensors",
66417
- "model.layers.47.mlp.experts.6.gate_proj.qzeros": "model-00005-of-00005.safetensors",
66418
- "model.layers.47.mlp.experts.6.gate_proj.scales": "model-00005-of-00005.safetensors",
66419
- "model.layers.47.mlp.experts.6.up_proj.g_idx": "model-00005-of-00005.safetensors",
66420
- "model.layers.47.mlp.experts.6.up_proj.qweight": "model-00005-of-00005.safetensors",
66421
- "model.layers.47.mlp.experts.6.up_proj.qzeros": "model-00005-of-00005.safetensors",
66422
- "model.layers.47.mlp.experts.6.up_proj.scales": "model-00005-of-00005.safetensors",
66423
  "model.layers.47.mlp.experts.60.down_proj.g_idx": "model-00005-of-00005.safetensors",
66424
  "model.layers.47.mlp.experts.60.down_proj.qweight": "model-00005-of-00005.safetensors",
66425
  "model.layers.47.mlp.experts.60.down_proj.qzeros": "model-00005-of-00005.safetensors",
@@ -66540,18 +66666,18 @@
66540
  "model.layers.47.mlp.experts.69.up_proj.qweight": "model-00005-of-00005.safetensors",
66541
  "model.layers.47.mlp.experts.69.up_proj.qzeros": "model-00005-of-00005.safetensors",
66542
  "model.layers.47.mlp.experts.69.up_proj.scales": "model-00005-of-00005.safetensors",
66543
- "model.layers.47.mlp.experts.7.down_proj.g_idx": "model-00005-of-00005.safetensors",
66544
- "model.layers.47.mlp.experts.7.down_proj.qweight": "model-00005-of-00005.safetensors",
66545
- "model.layers.47.mlp.experts.7.down_proj.qzeros": "model-00005-of-00005.safetensors",
66546
- "model.layers.47.mlp.experts.7.down_proj.scales": "model-00005-of-00005.safetensors",
66547
- "model.layers.47.mlp.experts.7.gate_proj.g_idx": "model-00005-of-00005.safetensors",
66548
- "model.layers.47.mlp.experts.7.gate_proj.qweight": "model-00005-of-00005.safetensors",
66549
- "model.layers.47.mlp.experts.7.gate_proj.qzeros": "model-00005-of-00005.safetensors",
66550
- "model.layers.47.mlp.experts.7.gate_proj.scales": "model-00005-of-00005.safetensors",
66551
- "model.layers.47.mlp.experts.7.up_proj.g_idx": "model-00005-of-00005.safetensors",
66552
- "model.layers.47.mlp.experts.7.up_proj.qweight": "model-00005-of-00005.safetensors",
66553
- "model.layers.47.mlp.experts.7.up_proj.qzeros": "model-00005-of-00005.safetensors",
66554
- "model.layers.47.mlp.experts.7.up_proj.scales": "model-00005-of-00005.safetensors",
66555
  "model.layers.47.mlp.experts.70.down_proj.g_idx": "model-00005-of-00005.safetensors",
66556
  "model.layers.47.mlp.experts.70.down_proj.qweight": "model-00005-of-00005.safetensors",
66557
  "model.layers.47.mlp.experts.70.down_proj.qzeros": "model-00005-of-00005.safetensors",
@@ -66672,18 +66798,18 @@
66672
  "model.layers.47.mlp.experts.79.up_proj.qweight": "model-00005-of-00005.safetensors",
66673
  "model.layers.47.mlp.experts.79.up_proj.qzeros": "model-00005-of-00005.safetensors",
66674
  "model.layers.47.mlp.experts.79.up_proj.scales": "model-00005-of-00005.safetensors",
66675
- "model.layers.47.mlp.experts.8.down_proj.g_idx": "model-00005-of-00005.safetensors",
66676
- "model.layers.47.mlp.experts.8.down_proj.qweight": "model-00005-of-00005.safetensors",
66677
- "model.layers.47.mlp.experts.8.down_proj.qzeros": "model-00005-of-00005.safetensors",
66678
- "model.layers.47.mlp.experts.8.down_proj.scales": "model-00005-of-00005.safetensors",
66679
- "model.layers.47.mlp.experts.8.gate_proj.g_idx": "model-00005-of-00005.safetensors",
66680
- "model.layers.47.mlp.experts.8.gate_proj.qweight": "model-00005-of-00005.safetensors",
66681
- "model.layers.47.mlp.experts.8.gate_proj.qzeros": "model-00005-of-00005.safetensors",
66682
- "model.layers.47.mlp.experts.8.gate_proj.scales": "model-00005-of-00005.safetensors",
66683
- "model.layers.47.mlp.experts.8.up_proj.g_idx": "model-00005-of-00005.safetensors",
66684
- "model.layers.47.mlp.experts.8.up_proj.qweight": "model-00005-of-00005.safetensors",
66685
- "model.layers.47.mlp.experts.8.up_proj.qzeros": "model-00005-of-00005.safetensors",
66686
- "model.layers.47.mlp.experts.8.up_proj.scales": "model-00005-of-00005.safetensors",
66687
  "model.layers.47.mlp.experts.80.down_proj.g_idx": "model-00005-of-00005.safetensors",
66688
  "model.layers.47.mlp.experts.80.down_proj.qweight": "model-00005-of-00005.safetensors",
66689
  "model.layers.47.mlp.experts.80.down_proj.qzeros": "model-00005-of-00005.safetensors",
@@ -66804,18 +66930,18 @@
66804
  "model.layers.47.mlp.experts.89.up_proj.qweight": "model-00005-of-00005.safetensors",
66805
  "model.layers.47.mlp.experts.89.up_proj.qzeros": "model-00005-of-00005.safetensors",
66806
  "model.layers.47.mlp.experts.89.up_proj.scales": "model-00005-of-00005.safetensors",
66807
- "model.layers.47.mlp.experts.9.down_proj.g_idx": "model-00005-of-00005.safetensors",
66808
- "model.layers.47.mlp.experts.9.down_proj.qweight": "model-00005-of-00005.safetensors",
66809
- "model.layers.47.mlp.experts.9.down_proj.qzeros": "model-00005-of-00005.safetensors",
66810
- "model.layers.47.mlp.experts.9.down_proj.scales": "model-00005-of-00005.safetensors",
66811
- "model.layers.47.mlp.experts.9.gate_proj.g_idx": "model-00005-of-00005.safetensors",
66812
- "model.layers.47.mlp.experts.9.gate_proj.qweight": "model-00005-of-00005.safetensors",
66813
- "model.layers.47.mlp.experts.9.gate_proj.qzeros": "model-00005-of-00005.safetensors",
66814
- "model.layers.47.mlp.experts.9.gate_proj.scales": "model-00005-of-00005.safetensors",
66815
- "model.layers.47.mlp.experts.9.up_proj.g_idx": "model-00005-of-00005.safetensors",
66816
- "model.layers.47.mlp.experts.9.up_proj.qweight": "model-00005-of-00005.safetensors",
66817
- "model.layers.47.mlp.experts.9.up_proj.qzeros": "model-00005-of-00005.safetensors",
66818
- "model.layers.47.mlp.experts.9.up_proj.scales": "model-00005-of-00005.safetensors",
66819
  "model.layers.47.mlp.experts.90.down_proj.g_idx": "model-00005-of-00005.safetensors",
66820
  "model.layers.47.mlp.experts.90.down_proj.qweight": "model-00005-of-00005.safetensors",
66821
  "model.layers.47.mlp.experts.90.down_proj.qzeros": "model-00005-of-00005.safetensors",
@@ -66936,7 +67062,10 @@
66936
  "model.layers.47.mlp.experts.99.up_proj.qweight": "model-00005-of-00005.safetensors",
66937
  "model.layers.47.mlp.experts.99.up_proj.qzeros": "model-00005-of-00005.safetensors",
66938
  "model.layers.47.mlp.experts.99.up_proj.scales": "model-00005-of-00005.safetensors",
66939
- "model.layers.47.mlp.gate.weight": "model-00004-of-00005.safetensors",
 
 
 
66940
  "model.layers.47.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
66941
  "model.layers.47.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
66942
  "model.layers.47.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
@@ -68493,7 +68622,10 @@
68493
  "model.layers.5.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
68494
  "model.layers.5.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
68495
  "model.layers.5.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
68496
- "model.layers.5.mlp.gate.weight": "model-00001-of-00005.safetensors",
 
 
 
68497
  "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
68498
  "model.layers.5.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
68499
  "model.layers.5.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
@@ -70050,7 +70182,10 @@
70050
  "model.layers.6.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
70051
  "model.layers.6.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
70052
  "model.layers.6.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
70053
- "model.layers.6.mlp.gate.weight": "model-00001-of-00005.safetensors",
 
 
 
70054
  "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
70055
  "model.layers.6.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
70056
  "model.layers.6.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
@@ -71607,7 +71742,10 @@
71607
  "model.layers.7.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
71608
  "model.layers.7.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
71609
  "model.layers.7.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
71610
- "model.layers.7.mlp.gate.weight": "model-00001-of-00005.safetensors",
 
 
 
71611
  "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
71612
  "model.layers.7.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
71613
  "model.layers.7.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
@@ -73164,7 +73302,10 @@
73164
  "model.layers.8.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
73165
  "model.layers.8.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
73166
  "model.layers.8.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
73167
- "model.layers.8.mlp.gate.weight": "model-00001-of-00005.safetensors",
 
 
 
73168
  "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
73169
  "model.layers.8.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
73170
  "model.layers.8.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
@@ -74721,7 +74862,10 @@
74721
  "model.layers.9.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
74722
  "model.layers.9.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
74723
  "model.layers.9.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
74724
- "model.layers.9.mlp.gate.weight": "model-00001-of-00005.safetensors",
 
 
 
74725
  "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
74726
  "model.layers.9.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
74727
  "model.layers.9.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
 
1
  {
2
  "metadata": {
3
+ "total_size": 16905940992
4
  },
5
  "weight_map": {
6
  "lm_head.weight": "model-00005-of-00005.safetensors",
 
1542
  "model.layers.0.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
1543
  "model.layers.0.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
1544
  "model.layers.0.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
1545
+ "model.layers.0.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
1546
+ "model.layers.0.mlp.gate.qweight": "model-00001-of-00005.safetensors",
1547
+ "model.layers.0.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
1548
+ "model.layers.0.mlp.gate.scales": "model-00001-of-00005.safetensors",
1549
  "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
1550
  "model.layers.0.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
1551
  "model.layers.0.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
 
3102
  "model.layers.1.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
3103
  "model.layers.1.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
3104
  "model.layers.1.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
3105
+ "model.layers.1.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
3106
+ "model.layers.1.mlp.gate.qweight": "model-00001-of-00005.safetensors",
3107
+ "model.layers.1.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
3108
+ "model.layers.1.mlp.gate.scales": "model-00001-of-00005.safetensors",
3109
  "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
3110
  "model.layers.1.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
3111
  "model.layers.1.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
 
3882
  "model.layers.10.mlp.experts.4.up_proj.qweight": "model-00001-of-00005.safetensors",
3883
  "model.layers.10.mlp.experts.4.up_proj.qzeros": "model-00001-of-00005.safetensors",
3884
  "model.layers.10.mlp.experts.4.up_proj.scales": "model-00001-of-00005.safetensors",
3885
+ "model.layers.10.mlp.experts.40.down_proj.g_idx": "model-00001-of-00005.safetensors",
3886
+ "model.layers.10.mlp.experts.40.down_proj.qweight": "model-00001-of-00005.safetensors",
3887
+ "model.layers.10.mlp.experts.40.down_proj.qzeros": "model-00001-of-00005.safetensors",
3888
+ "model.layers.10.mlp.experts.40.down_proj.scales": "model-00001-of-00005.safetensors",
3889
  "model.layers.10.mlp.experts.40.gate_proj.g_idx": "model-00001-of-00005.safetensors",
3890
  "model.layers.10.mlp.experts.40.gate_proj.qweight": "model-00001-of-00005.safetensors",
3891
  "model.layers.10.mlp.experts.40.gate_proj.qzeros": "model-00001-of-00005.safetensors",
 
3894
  "model.layers.10.mlp.experts.40.up_proj.qweight": "model-00001-of-00005.safetensors",
3895
  "model.layers.10.mlp.experts.40.up_proj.qzeros": "model-00001-of-00005.safetensors",
3896
  "model.layers.10.mlp.experts.40.up_proj.scales": "model-00001-of-00005.safetensors",
3897
+ "model.layers.10.mlp.experts.41.down_proj.g_idx": "model-00001-of-00005.safetensors",
3898
+ "model.layers.10.mlp.experts.41.down_proj.qweight": "model-00001-of-00005.safetensors",
3899
+ "model.layers.10.mlp.experts.41.down_proj.qzeros": "model-00001-of-00005.safetensors",
3900
+ "model.layers.10.mlp.experts.41.down_proj.scales": "model-00001-of-00005.safetensors",
3901
+ "model.layers.10.mlp.experts.41.gate_proj.g_idx": "model-00001-of-00005.safetensors",
3902
+ "model.layers.10.mlp.experts.41.gate_proj.qweight": "model-00001-of-00005.safetensors",
3903
+ "model.layers.10.mlp.experts.41.gate_proj.qzeros": "model-00001-of-00005.safetensors",
3904
+ "model.layers.10.mlp.experts.41.gate_proj.scales": "model-00001-of-00005.safetensors",
3905
+ "model.layers.10.mlp.experts.41.up_proj.g_idx": "model-00001-of-00005.safetensors",
3906
+ "model.layers.10.mlp.experts.41.up_proj.qweight": "model-00001-of-00005.safetensors",
3907
+ "model.layers.10.mlp.experts.41.up_proj.qzeros": "model-00001-of-00005.safetensors",
3908
+ "model.layers.10.mlp.experts.41.up_proj.scales": "model-00001-of-00005.safetensors",
3909
  "model.layers.10.mlp.experts.42.down_proj.g_idx": "model-00002-of-00005.safetensors",
3910
  "model.layers.10.mlp.experts.42.down_proj.qweight": "model-00002-of-00005.safetensors",
3911
  "model.layers.10.mlp.experts.42.down_proj.qzeros": "model-00002-of-00005.safetensors",
3912
  "model.layers.10.mlp.experts.42.down_proj.scales": "model-00002-of-00005.safetensors",
3913
+ "model.layers.10.mlp.experts.42.gate_proj.g_idx": "model-00001-of-00005.safetensors",
3914
+ "model.layers.10.mlp.experts.42.gate_proj.qweight": "model-00001-of-00005.safetensors",
3915
+ "model.layers.10.mlp.experts.42.gate_proj.qzeros": "model-00001-of-00005.safetensors",
3916
+ "model.layers.10.mlp.experts.42.gate_proj.scales": "model-00001-of-00005.safetensors",
3917
  "model.layers.10.mlp.experts.42.up_proj.g_idx": "model-00002-of-00005.safetensors",
3918
  "model.layers.10.mlp.experts.42.up_proj.qweight": "model-00002-of-00005.safetensors",
3919
  "model.layers.10.mlp.experts.42.up_proj.qzeros": "model-00002-of-00005.safetensors",
 
4662
  "model.layers.10.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
4663
  "model.layers.10.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
4664
  "model.layers.10.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
4665
+ "model.layers.10.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
4666
+ "model.layers.10.mlp.gate.qweight": "model-00001-of-00005.safetensors",
4667
+ "model.layers.10.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
4668
+ "model.layers.10.mlp.gate.scales": "model-00001-of-00005.safetensors",
4669
  "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
4670
  "model.layers.10.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
4671
  "model.layers.10.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
 
6222
  "model.layers.11.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
6223
  "model.layers.11.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
6224
  "model.layers.11.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
6225
+ "model.layers.11.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
6226
+ "model.layers.11.mlp.gate.qweight": "model-00002-of-00005.safetensors",
6227
+ "model.layers.11.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
6228
+ "model.layers.11.mlp.gate.scales": "model-00002-of-00005.safetensors",
6229
  "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
6230
  "model.layers.11.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
6231
  "model.layers.11.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
 
7782
  "model.layers.12.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
7783
  "model.layers.12.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
7784
  "model.layers.12.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
7785
+ "model.layers.12.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
7786
+ "model.layers.12.mlp.gate.qweight": "model-00002-of-00005.safetensors",
7787
+ "model.layers.12.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
7788
+ "model.layers.12.mlp.gate.scales": "model-00002-of-00005.safetensors",
7789
  "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
7790
  "model.layers.12.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
7791
  "model.layers.12.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
 
9342
  "model.layers.13.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
9343
  "model.layers.13.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
9344
  "model.layers.13.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
9345
+ "model.layers.13.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
9346
+ "model.layers.13.mlp.gate.qweight": "model-00002-of-00005.safetensors",
9347
+ "model.layers.13.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
9348
+ "model.layers.13.mlp.gate.scales": "model-00002-of-00005.safetensors",
9349
  "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
9350
  "model.layers.13.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
9351
  "model.layers.13.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
 
10902
  "model.layers.14.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
10903
  "model.layers.14.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
10904
  "model.layers.14.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
10905
+ "model.layers.14.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
10906
+ "model.layers.14.mlp.gate.qweight": "model-00002-of-00005.safetensors",
10907
+ "model.layers.14.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
10908
+ "model.layers.14.mlp.gate.scales": "model-00002-of-00005.safetensors",
10909
  "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
10910
  "model.layers.14.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
10911
  "model.layers.14.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
 
12462
  "model.layers.15.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
12463
  "model.layers.15.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
12464
  "model.layers.15.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
12465
+ "model.layers.15.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
12466
+ "model.layers.15.mlp.gate.qweight": "model-00002-of-00005.safetensors",
12467
+ "model.layers.15.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
12468
+ "model.layers.15.mlp.gate.scales": "model-00002-of-00005.safetensors",
12469
  "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
12470
  "model.layers.15.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
12471
  "model.layers.15.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
 
14022
  "model.layers.16.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
14023
  "model.layers.16.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
14024
  "model.layers.16.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
14025
+ "model.layers.16.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
14026
+ "model.layers.16.mlp.gate.qweight": "model-00002-of-00005.safetensors",
14027
+ "model.layers.16.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
14028
+ "model.layers.16.mlp.gate.scales": "model-00002-of-00005.safetensors",
14029
  "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
14030
  "model.layers.16.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
14031
  "model.layers.16.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
 
15582
  "model.layers.17.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
15583
  "model.layers.17.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
15584
  "model.layers.17.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
15585
+ "model.layers.17.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
15586
+ "model.layers.17.mlp.gate.qweight": "model-00002-of-00005.safetensors",
15587
+ "model.layers.17.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
15588
+ "model.layers.17.mlp.gate.scales": "model-00002-of-00005.safetensors",
15589
  "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
15590
  "model.layers.17.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
15591
  "model.layers.17.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
 
17142
  "model.layers.18.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
17143
  "model.layers.18.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
17144
  "model.layers.18.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
17145
+ "model.layers.18.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
17146
+ "model.layers.18.mlp.gate.qweight": "model-00002-of-00005.safetensors",
17147
+ "model.layers.18.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
17148
+ "model.layers.18.mlp.gate.scales": "model-00002-of-00005.safetensors",
17149
  "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
17150
  "model.layers.18.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
17151
  "model.layers.18.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
 
18702
  "model.layers.19.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
18703
  "model.layers.19.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
18704
  "model.layers.19.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
18705
+ "model.layers.19.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
18706
+ "model.layers.19.mlp.gate.qweight": "model-00002-of-00005.safetensors",
18707
+ "model.layers.19.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
18708
+ "model.layers.19.mlp.gate.scales": "model-00002-of-00005.safetensors",
18709
  "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
18710
  "model.layers.19.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
18711
  "model.layers.19.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
 
20262
  "model.layers.2.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
20263
  "model.layers.2.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
20264
  "model.layers.2.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
20265
+ "model.layers.2.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
20266
+ "model.layers.2.mlp.gate.qweight": "model-00001-of-00005.safetensors",
20267
+ "model.layers.2.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
20268
+ "model.layers.2.mlp.gate.scales": "model-00001-of-00005.safetensors",
20269
  "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
20270
  "model.layers.2.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
20271
  "model.layers.2.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
 
21822
  "model.layers.20.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
21823
  "model.layers.20.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
21824
  "model.layers.20.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
21825
+ "model.layers.20.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
21826
+ "model.layers.20.mlp.gate.qweight": "model-00002-of-00005.safetensors",
21827
+ "model.layers.20.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
21828
+ "model.layers.20.mlp.gate.scales": "model-00002-of-00005.safetensors",
21829
  "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
21830
  "model.layers.20.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
21831
  "model.layers.20.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
 
23382
  "model.layers.21.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
23383
  "model.layers.21.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
23384
  "model.layers.21.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
23385
+ "model.layers.21.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
23386
+ "model.layers.21.mlp.gate.qweight": "model-00002-of-00005.safetensors",
23387
+ "model.layers.21.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
23388
+ "model.layers.21.mlp.gate.scales": "model-00002-of-00005.safetensors",
23389
  "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
23390
  "model.layers.21.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
23391
  "model.layers.21.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
 
24594
  "model.layers.22.mlp.experts.72.up_proj.qweight": "model-00002-of-00005.safetensors",
24595
  "model.layers.22.mlp.experts.72.up_proj.qzeros": "model-00002-of-00005.safetensors",
24596
  "model.layers.22.mlp.experts.72.up_proj.scales": "model-00002-of-00005.safetensors",
24597
+ "model.layers.22.mlp.experts.73.down_proj.g_idx": "model-00002-of-00005.safetensors",
24598
+ "model.layers.22.mlp.experts.73.down_proj.qweight": "model-00002-of-00005.safetensors",
24599
+ "model.layers.22.mlp.experts.73.down_proj.qzeros": "model-00002-of-00005.safetensors",
24600
+ "model.layers.22.mlp.experts.73.down_proj.scales": "model-00002-of-00005.safetensors",
24601
+ "model.layers.22.mlp.experts.73.gate_proj.g_idx": "model-00002-of-00005.safetensors",
24602
+ "model.layers.22.mlp.experts.73.gate_proj.qweight": "model-00002-of-00005.safetensors",
24603
+ "model.layers.22.mlp.experts.73.gate_proj.qzeros": "model-00002-of-00005.safetensors",
24604
+ "model.layers.22.mlp.experts.73.gate_proj.scales": "model-00002-of-00005.safetensors",
24605
+ "model.layers.22.mlp.experts.73.up_proj.g_idx": "model-00002-of-00005.safetensors",
24606
+ "model.layers.22.mlp.experts.73.up_proj.qweight": "model-00002-of-00005.safetensors",
24607
+ "model.layers.22.mlp.experts.73.up_proj.qzeros": "model-00002-of-00005.safetensors",
24608
+ "model.layers.22.mlp.experts.73.up_proj.scales": "model-00002-of-00005.safetensors",
24609
+ "model.layers.22.mlp.experts.74.down_proj.g_idx": "model-00002-of-00005.safetensors",
24610
+ "model.layers.22.mlp.experts.74.down_proj.qweight": "model-00002-of-00005.safetensors",
24611
+ "model.layers.22.mlp.experts.74.down_proj.qzeros": "model-00002-of-00005.safetensors",
24612
+ "model.layers.22.mlp.experts.74.down_proj.scales": "model-00002-of-00005.safetensors",
24613
+ "model.layers.22.mlp.experts.74.gate_proj.g_idx": "model-00002-of-00005.safetensors",
24614
+ "model.layers.22.mlp.experts.74.gate_proj.qweight": "model-00002-of-00005.safetensors",
24615
+ "model.layers.22.mlp.experts.74.gate_proj.qzeros": "model-00002-of-00005.safetensors",
24616
+ "model.layers.22.mlp.experts.74.gate_proj.scales": "model-00002-of-00005.safetensors",
24617
+ "model.layers.22.mlp.experts.74.up_proj.g_idx": "model-00002-of-00005.safetensors",
24618
+ "model.layers.22.mlp.experts.74.up_proj.qweight": "model-00002-of-00005.safetensors",
24619
+ "model.layers.22.mlp.experts.74.up_proj.qzeros": "model-00002-of-00005.safetensors",
24620
+ "model.layers.22.mlp.experts.74.up_proj.scales": "model-00002-of-00005.safetensors",
24621
+ "model.layers.22.mlp.experts.75.down_proj.g_idx": "model-00002-of-00005.safetensors",
24622
+ "model.layers.22.mlp.experts.75.down_proj.qweight": "model-00002-of-00005.safetensors",
24623
+ "model.layers.22.mlp.experts.75.down_proj.qzeros": "model-00002-of-00005.safetensors",
24624
+ "model.layers.22.mlp.experts.75.down_proj.scales": "model-00002-of-00005.safetensors",
24625
+ "model.layers.22.mlp.experts.75.gate_proj.g_idx": "model-00002-of-00005.safetensors",
24626
+ "model.layers.22.mlp.experts.75.gate_proj.qweight": "model-00002-of-00005.safetensors",
24627
+ "model.layers.22.mlp.experts.75.gate_proj.qzeros": "model-00002-of-00005.safetensors",
24628
+ "model.layers.22.mlp.experts.75.gate_proj.scales": "model-00002-of-00005.safetensors",
24629
+ "model.layers.22.mlp.experts.75.up_proj.g_idx": "model-00002-of-00005.safetensors",
24630
+ "model.layers.22.mlp.experts.75.up_proj.qweight": "model-00002-of-00005.safetensors",
24631
+ "model.layers.22.mlp.experts.75.up_proj.qzeros": "model-00002-of-00005.safetensors",
24632
+ "model.layers.22.mlp.experts.75.up_proj.scales": "model-00002-of-00005.safetensors",
24633
  "model.layers.22.mlp.experts.76.down_proj.g_idx": "model-00003-of-00005.safetensors",
24634
  "model.layers.22.mlp.experts.76.down_proj.qweight": "model-00003-of-00005.safetensors",
24635
  "model.layers.22.mlp.experts.76.down_proj.qzeros": "model-00003-of-00005.safetensors",
24636
  "model.layers.22.mlp.experts.76.down_proj.scales": "model-00003-of-00005.safetensors",
24637
+ "model.layers.22.mlp.experts.76.gate_proj.g_idx": "model-00002-of-00005.safetensors",
24638
+ "model.layers.22.mlp.experts.76.gate_proj.qweight": "model-00002-of-00005.safetensors",
24639
+ "model.layers.22.mlp.experts.76.gate_proj.qzeros": "model-00002-of-00005.safetensors",
24640
+ "model.layers.22.mlp.experts.76.gate_proj.scales": "model-00002-of-00005.safetensors",
24641
  "model.layers.22.mlp.experts.76.up_proj.g_idx": "model-00003-of-00005.safetensors",
24642
  "model.layers.22.mlp.experts.76.up_proj.qweight": "model-00003-of-00005.safetensors",
24643
  "model.layers.22.mlp.experts.76.up_proj.qzeros": "model-00003-of-00005.safetensors",
 
24942
  "model.layers.22.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
24943
  "model.layers.22.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
24944
  "model.layers.22.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
24945
+ "model.layers.22.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
24946
+ "model.layers.22.mlp.gate.qweight": "model-00002-of-00005.safetensors",
24947
+ "model.layers.22.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
24948
+ "model.layers.22.mlp.gate.scales": "model-00002-of-00005.safetensors",
24949
  "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
24950
  "model.layers.22.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
24951
  "model.layers.22.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
 
26502
  "model.layers.23.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
26503
  "model.layers.23.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
26504
  "model.layers.23.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
26505
+ "model.layers.23.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
26506
+ "model.layers.23.mlp.gate.qweight": "model-00003-of-00005.safetensors",
26507
+ "model.layers.23.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
26508
+ "model.layers.23.mlp.gate.scales": "model-00003-of-00005.safetensors",
26509
  "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
26510
  "model.layers.23.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
26511
  "model.layers.23.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
 
28062
  "model.layers.24.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
28063
  "model.layers.24.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
28064
  "model.layers.24.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
28065
+ "model.layers.24.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
28066
+ "model.layers.24.mlp.gate.qweight": "model-00003-of-00005.safetensors",
28067
+ "model.layers.24.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
28068
+ "model.layers.24.mlp.gate.scales": "model-00003-of-00005.safetensors",
28069
  "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
28070
  "model.layers.24.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
28071
  "model.layers.24.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
 
29622
  "model.layers.25.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
29623
  "model.layers.25.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
29624
  "model.layers.25.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
29625
+ "model.layers.25.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
29626
+ "model.layers.25.mlp.gate.qweight": "model-00003-of-00005.safetensors",
29627
+ "model.layers.25.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
29628
+ "model.layers.25.mlp.gate.scales": "model-00003-of-00005.safetensors",
29629
  "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
29630
  "model.layers.25.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
29631
  "model.layers.25.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
 
31182
  "model.layers.26.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
31183
  "model.layers.26.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
31184
  "model.layers.26.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
31185
+ "model.layers.26.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
31186
+ "model.layers.26.mlp.gate.qweight": "model-00003-of-00005.safetensors",
31187
+ "model.layers.26.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
31188
+ "model.layers.26.mlp.gate.scales": "model-00003-of-00005.safetensors",
31189
  "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
31190
  "model.layers.26.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
31191
  "model.layers.26.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
 
32742
  "model.layers.27.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
32743
  "model.layers.27.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
32744
  "model.layers.27.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
32745
+ "model.layers.27.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
32746
+ "model.layers.27.mlp.gate.qweight": "model-00003-of-00005.safetensors",
32747
+ "model.layers.27.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
32748
+ "model.layers.27.mlp.gate.scales": "model-00003-of-00005.safetensors",
32749
  "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
32750
  "model.layers.27.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
32751
  "model.layers.27.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
 
34302
  "model.layers.28.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
34303
  "model.layers.28.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
34304
  "model.layers.28.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
34305
+ "model.layers.28.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
34306
+ "model.layers.28.mlp.gate.qweight": "model-00003-of-00005.safetensors",
34307
+ "model.layers.28.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
34308
+ "model.layers.28.mlp.gate.scales": "model-00003-of-00005.safetensors",
34309
  "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
34310
  "model.layers.28.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
34311
  "model.layers.28.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
 
35862
  "model.layers.29.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
35863
  "model.layers.29.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
35864
  "model.layers.29.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
35865
+ "model.layers.29.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
35866
+ "model.layers.29.mlp.gate.qweight": "model-00003-of-00005.safetensors",
35867
+ "model.layers.29.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
35868
+ "model.layers.29.mlp.gate.scales": "model-00003-of-00005.safetensors",
35869
  "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
35870
  "model.layers.29.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
35871
  "model.layers.29.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
 
37422
  "model.layers.3.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
37423
  "model.layers.3.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
37424
  "model.layers.3.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
37425
+ "model.layers.3.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
37426
+ "model.layers.3.mlp.gate.qweight": "model-00001-of-00005.safetensors",
37427
+ "model.layers.3.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
37428
+ "model.layers.3.mlp.gate.scales": "model-00001-of-00005.safetensors",
37429
  "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
37430
  "model.layers.3.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
37431
  "model.layers.3.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
 
38982
  "model.layers.30.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
38983
  "model.layers.30.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
38984
  "model.layers.30.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
38985
+ "model.layers.30.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
38986
+ "model.layers.30.mlp.gate.qweight": "model-00003-of-00005.safetensors",
38987
+ "model.layers.30.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
38988
+ "model.layers.30.mlp.gate.scales": "model-00003-of-00005.safetensors",
38989
  "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
38990
  "model.layers.30.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
38991
  "model.layers.30.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
 
40542
  "model.layers.31.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
40543
  "model.layers.31.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
40544
  "model.layers.31.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
40545
+ "model.layers.31.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
40546
+ "model.layers.31.mlp.gate.qweight": "model-00003-of-00005.safetensors",
40547
+ "model.layers.31.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
40548
+ "model.layers.31.mlp.gate.scales": "model-00003-of-00005.safetensors",
40549
  "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
40550
  "model.layers.31.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
40551
  "model.layers.31.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
 
42102
  "model.layers.32.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
42103
  "model.layers.32.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
42104
  "model.layers.32.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
42105
+ "model.layers.32.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
42106
+ "model.layers.32.mlp.gate.qweight": "model-00003-of-00005.safetensors",
42107
+ "model.layers.32.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
42108
+ "model.layers.32.mlp.gate.scales": "model-00003-of-00005.safetensors",
42109
  "model.layers.32.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
42110
  "model.layers.32.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
42111
  "model.layers.32.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
 
43662
  "model.layers.33.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
43663
  "model.layers.33.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
43664
  "model.layers.33.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
43665
+ "model.layers.33.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
43666
+ "model.layers.33.mlp.gate.qweight": "model-00003-of-00005.safetensors",
43667
+ "model.layers.33.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
43668
+ "model.layers.33.mlp.gate.scales": "model-00003-of-00005.safetensors",
43669
  "model.layers.33.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
43670
  "model.layers.33.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
43671
  "model.layers.33.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
 
43782
  "model.layers.34.mlp.experts.104.up_proj.qweight": "model-00003-of-00005.safetensors",
43783
  "model.layers.34.mlp.experts.104.up_proj.qzeros": "model-00003-of-00005.safetensors",
43784
  "model.layers.34.mlp.experts.104.up_proj.scales": "model-00003-of-00005.safetensors",
43785
+ "model.layers.34.mlp.experts.105.down_proj.g_idx": "model-00003-of-00005.safetensors",
43786
+ "model.layers.34.mlp.experts.105.down_proj.qweight": "model-00003-of-00005.safetensors",
43787
+ "model.layers.34.mlp.experts.105.down_proj.qzeros": "model-00003-of-00005.safetensors",
43788
+ "model.layers.34.mlp.experts.105.down_proj.scales": "model-00003-of-00005.safetensors",
43789
  "model.layers.34.mlp.experts.105.gate_proj.g_idx": "model-00003-of-00005.safetensors",
43790
  "model.layers.34.mlp.experts.105.gate_proj.qweight": "model-00003-of-00005.safetensors",
43791
  "model.layers.34.mlp.experts.105.gate_proj.qzeros": "model-00003-of-00005.safetensors",
43792
  "model.layers.34.mlp.experts.105.gate_proj.scales": "model-00003-of-00005.safetensors",
43793
+ "model.layers.34.mlp.experts.105.up_proj.g_idx": "model-00003-of-00005.safetensors",
43794
+ "model.layers.34.mlp.experts.105.up_proj.qweight": "model-00003-of-00005.safetensors",
43795
+ "model.layers.34.mlp.experts.105.up_proj.qzeros": "model-00003-of-00005.safetensors",
43796
+ "model.layers.34.mlp.experts.105.up_proj.scales": "model-00003-of-00005.safetensors",
43797
+ "model.layers.34.mlp.experts.106.down_proj.g_idx": "model-00003-of-00005.safetensors",
43798
+ "model.layers.34.mlp.experts.106.down_proj.qweight": "model-00003-of-00005.safetensors",
43799
+ "model.layers.34.mlp.experts.106.down_proj.qzeros": "model-00003-of-00005.safetensors",
43800
+ "model.layers.34.mlp.experts.106.down_proj.scales": "model-00003-of-00005.safetensors",
43801
+ "model.layers.34.mlp.experts.106.gate_proj.g_idx": "model-00003-of-00005.safetensors",
43802
+ "model.layers.34.mlp.experts.106.gate_proj.qweight": "model-00003-of-00005.safetensors",
43803
+ "model.layers.34.mlp.experts.106.gate_proj.qzeros": "model-00003-of-00005.safetensors",
43804
+ "model.layers.34.mlp.experts.106.gate_proj.scales": "model-00003-of-00005.safetensors",
43805
+ "model.layers.34.mlp.experts.106.up_proj.g_idx": "model-00003-of-00005.safetensors",
43806
+ "model.layers.34.mlp.experts.106.up_proj.qweight": "model-00003-of-00005.safetensors",
43807
+ "model.layers.34.mlp.experts.106.up_proj.qzeros": "model-00003-of-00005.safetensors",
43808
+ "model.layers.34.mlp.experts.106.up_proj.scales": "model-00003-of-00005.safetensors",
43809
+ "model.layers.34.mlp.experts.107.down_proj.g_idx": "model-00003-of-00005.safetensors",
43810
+ "model.layers.34.mlp.experts.107.down_proj.qweight": "model-00003-of-00005.safetensors",
43811
+ "model.layers.34.mlp.experts.107.down_proj.qzeros": "model-00003-of-00005.safetensors",
43812
+ "model.layers.34.mlp.experts.107.down_proj.scales": "model-00003-of-00005.safetensors",
43813
+ "model.layers.34.mlp.experts.107.gate_proj.g_idx": "model-00003-of-00005.safetensors",
43814
+ "model.layers.34.mlp.experts.107.gate_proj.qweight": "model-00003-of-00005.safetensors",
43815
+ "model.layers.34.mlp.experts.107.gate_proj.qzeros": "model-00003-of-00005.safetensors",
43816
+ "model.layers.34.mlp.experts.107.gate_proj.scales": "model-00003-of-00005.safetensors",
43817
+ "model.layers.34.mlp.experts.107.up_proj.g_idx": "model-00003-of-00005.safetensors",
43818
+ "model.layers.34.mlp.experts.107.up_proj.qweight": "model-00003-of-00005.safetensors",
43819
+ "model.layers.34.mlp.experts.107.up_proj.qzeros": "model-00003-of-00005.safetensors",
43820
+ "model.layers.34.mlp.experts.107.up_proj.scales": "model-00003-of-00005.safetensors",
43821
+ "model.layers.34.mlp.experts.108.down_proj.g_idx": "model-00003-of-00005.safetensors",
43822
+ "model.layers.34.mlp.experts.108.down_proj.qweight": "model-00003-of-00005.safetensors",
43823
+ "model.layers.34.mlp.experts.108.down_proj.qzeros": "model-00003-of-00005.safetensors",
43824
+ "model.layers.34.mlp.experts.108.down_proj.scales": "model-00003-of-00005.safetensors",
43825
+ "model.layers.34.mlp.experts.108.gate_proj.g_idx": "model-00003-of-00005.safetensors",
43826
+ "model.layers.34.mlp.experts.108.gate_proj.qweight": "model-00003-of-00005.safetensors",
43827
+ "model.layers.34.mlp.experts.108.gate_proj.qzeros": "model-00003-of-00005.safetensors",
43828
+ "model.layers.34.mlp.experts.108.gate_proj.scales": "model-00003-of-00005.safetensors",
43829
+ "model.layers.34.mlp.experts.108.up_proj.g_idx": "model-00003-of-00005.safetensors",
43830
+ "model.layers.34.mlp.experts.108.up_proj.qweight": "model-00003-of-00005.safetensors",
43831
+ "model.layers.34.mlp.experts.108.up_proj.qzeros": "model-00003-of-00005.safetensors",
43832
+ "model.layers.34.mlp.experts.108.up_proj.scales": "model-00003-of-00005.safetensors",
43833
+ "model.layers.34.mlp.experts.109.down_proj.g_idx": "model-00003-of-00005.safetensors",
43834
+ "model.layers.34.mlp.experts.109.down_proj.qweight": "model-00003-of-00005.safetensors",
43835
+ "model.layers.34.mlp.experts.109.down_proj.qzeros": "model-00003-of-00005.safetensors",
43836
+ "model.layers.34.mlp.experts.109.down_proj.scales": "model-00003-of-00005.safetensors",
43837
+ "model.layers.34.mlp.experts.109.gate_proj.g_idx": "model-00003-of-00005.safetensors",
43838
+ "model.layers.34.mlp.experts.109.gate_proj.qweight": "model-00003-of-00005.safetensors",
43839
+ "model.layers.34.mlp.experts.109.gate_proj.qzeros": "model-00003-of-00005.safetensors",
43840
+ "model.layers.34.mlp.experts.109.gate_proj.scales": "model-00003-of-00005.safetensors",
43841
+ "model.layers.34.mlp.experts.109.up_proj.g_idx": "model-00003-of-00005.safetensors",
43842
+ "model.layers.34.mlp.experts.109.up_proj.qweight": "model-00003-of-00005.safetensors",
43843
+ "model.layers.34.mlp.experts.109.up_proj.qzeros": "model-00003-of-00005.safetensors",
43844
+ "model.layers.34.mlp.experts.109.up_proj.scales": "model-00003-of-00005.safetensors",
43845
  "model.layers.34.mlp.experts.11.down_proj.g_idx": "model-00003-of-00005.safetensors",
43846
  "model.layers.34.mlp.experts.11.down_proj.qweight": "model-00003-of-00005.safetensors",
43847
  "model.layers.34.mlp.experts.11.down_proj.qzeros": "model-00003-of-00005.safetensors",
 
43858
  "model.layers.34.mlp.experts.110.down_proj.qweight": "model-00004-of-00005.safetensors",
43859
  "model.layers.34.mlp.experts.110.down_proj.qzeros": "model-00004-of-00005.safetensors",
43860
  "model.layers.34.mlp.experts.110.down_proj.scales": "model-00004-of-00005.safetensors",
43861
+ "model.layers.34.mlp.experts.110.gate_proj.g_idx": "model-00003-of-00005.safetensors",
43862
+ "model.layers.34.mlp.experts.110.gate_proj.qweight": "model-00003-of-00005.safetensors",
43863
+ "model.layers.34.mlp.experts.110.gate_proj.qzeros": "model-00003-of-00005.safetensors",
43864
+ "model.layers.34.mlp.experts.110.gate_proj.scales": "model-00003-of-00005.safetensors",
43865
  "model.layers.34.mlp.experts.110.up_proj.g_idx": "model-00004-of-00005.safetensors",
43866
  "model.layers.34.mlp.experts.110.up_proj.qweight": "model-00004-of-00005.safetensors",
43867
  "model.layers.34.mlp.experts.110.up_proj.qzeros": "model-00004-of-00005.safetensors",
 
45222
  "model.layers.34.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
45223
  "model.layers.34.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
45224
  "model.layers.34.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
45225
+ "model.layers.34.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
45226
+ "model.layers.34.mlp.gate.qweight": "model-00003-of-00005.safetensors",
45227
+ "model.layers.34.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
45228
+ "model.layers.34.mlp.gate.scales": "model-00003-of-00005.safetensors",
45229
  "model.layers.34.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
45230
  "model.layers.34.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
45231
  "model.layers.34.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
 
46782
  "model.layers.35.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
46783
  "model.layers.35.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
46784
  "model.layers.35.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
46785
+ "model.layers.35.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
46786
+ "model.layers.35.mlp.gate.qweight": "model-00004-of-00005.safetensors",
46787
+ "model.layers.35.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
46788
+ "model.layers.35.mlp.gate.scales": "model-00004-of-00005.safetensors",
46789
  "model.layers.35.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
46790
  "model.layers.35.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
46791
  "model.layers.35.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
48342
  "model.layers.36.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
48343
  "model.layers.36.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
48344
  "model.layers.36.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
48345
+ "model.layers.36.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
48346
+ "model.layers.36.mlp.gate.qweight": "model-00004-of-00005.safetensors",
48347
+ "model.layers.36.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
48348
+ "model.layers.36.mlp.gate.scales": "model-00004-of-00005.safetensors",
48349
  "model.layers.36.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
48350
  "model.layers.36.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
48351
  "model.layers.36.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
49902
  "model.layers.37.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
49903
  "model.layers.37.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
49904
  "model.layers.37.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
49905
+ "model.layers.37.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
49906
+ "model.layers.37.mlp.gate.qweight": "model-00004-of-00005.safetensors",
49907
+ "model.layers.37.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
49908
+ "model.layers.37.mlp.gate.scales": "model-00004-of-00005.safetensors",
49909
  "model.layers.37.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
49910
  "model.layers.37.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
49911
  "model.layers.37.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
51462
  "model.layers.38.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
51463
  "model.layers.38.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
51464
  "model.layers.38.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
51465
+ "model.layers.38.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
51466
+ "model.layers.38.mlp.gate.qweight": "model-00004-of-00005.safetensors",
51467
+ "model.layers.38.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
51468
+ "model.layers.38.mlp.gate.scales": "model-00004-of-00005.safetensors",
51469
  "model.layers.38.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
51470
  "model.layers.38.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
51471
  "model.layers.38.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
53022
  "model.layers.39.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
53023
  "model.layers.39.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
53024
  "model.layers.39.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
53025
+ "model.layers.39.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
53026
+ "model.layers.39.mlp.gate.qweight": "model-00004-of-00005.safetensors",
53027
+ "model.layers.39.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
53028
+ "model.layers.39.mlp.gate.scales": "model-00004-of-00005.safetensors",
53029
  "model.layers.39.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
53030
  "model.layers.39.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
53031
  "model.layers.39.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
54582
  "model.layers.4.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
54583
  "model.layers.4.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
54584
  "model.layers.4.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
54585
+ "model.layers.4.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
54586
+ "model.layers.4.mlp.gate.qweight": "model-00001-of-00005.safetensors",
54587
+ "model.layers.4.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
54588
+ "model.layers.4.mlp.gate.scales": "model-00001-of-00005.safetensors",
54589
  "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
54590
  "model.layers.4.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
54591
  "model.layers.4.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
 
56142
  "model.layers.40.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
56143
  "model.layers.40.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
56144
  "model.layers.40.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
56145
+ "model.layers.40.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
56146
+ "model.layers.40.mlp.gate.qweight": "model-00004-of-00005.safetensors",
56147
+ "model.layers.40.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
56148
+ "model.layers.40.mlp.gate.scales": "model-00004-of-00005.safetensors",
56149
  "model.layers.40.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
56150
  "model.layers.40.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
56151
  "model.layers.40.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
57702
  "model.layers.41.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
57703
  "model.layers.41.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
57704
  "model.layers.41.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
57705
+ "model.layers.41.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
57706
+ "model.layers.41.mlp.gate.qweight": "model-00004-of-00005.safetensors",
57707
+ "model.layers.41.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
57708
+ "model.layers.41.mlp.gate.scales": "model-00004-of-00005.safetensors",
57709
  "model.layers.41.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
57710
  "model.layers.41.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
57711
  "model.layers.41.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
59262
  "model.layers.42.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
59263
  "model.layers.42.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
59264
  "model.layers.42.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
59265
+ "model.layers.42.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
59266
+ "model.layers.42.mlp.gate.qweight": "model-00004-of-00005.safetensors",
59267
+ "model.layers.42.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
59268
+ "model.layers.42.mlp.gate.scales": "model-00004-of-00005.safetensors",
59269
  "model.layers.42.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
59270
  "model.layers.42.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
59271
  "model.layers.42.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
60822
  "model.layers.43.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
60823
  "model.layers.43.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
60824
  "model.layers.43.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
60825
+ "model.layers.43.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
60826
+ "model.layers.43.mlp.gate.qweight": "model-00004-of-00005.safetensors",
60827
+ "model.layers.43.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
60828
+ "model.layers.43.mlp.gate.scales": "model-00004-of-00005.safetensors",
60829
  "model.layers.43.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
60830
  "model.layers.43.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
60831
  "model.layers.43.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
62382
  "model.layers.44.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
62383
  "model.layers.44.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
62384
  "model.layers.44.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
62385
+ "model.layers.44.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
62386
+ "model.layers.44.mlp.gate.qweight": "model-00004-of-00005.safetensors",
62387
+ "model.layers.44.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
62388
+ "model.layers.44.mlp.gate.scales": "model-00004-of-00005.safetensors",
62389
  "model.layers.44.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
62390
  "model.layers.44.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
62391
  "model.layers.44.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
63942
  "model.layers.45.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
63943
  "model.layers.45.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
63944
  "model.layers.45.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
63945
+ "model.layers.45.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
63946
+ "model.layers.45.mlp.gate.qweight": "model-00004-of-00005.safetensors",
63947
+ "model.layers.45.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
63948
+ "model.layers.45.mlp.gate.scales": "model-00004-of-00005.safetensors",
63949
  "model.layers.45.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
63950
  "model.layers.45.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
63951
  "model.layers.45.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
65502
  "model.layers.46.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
65503
  "model.layers.46.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
65504
  "model.layers.46.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
65505
+ "model.layers.46.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
65506
+ "model.layers.46.mlp.gate.qweight": "model-00004-of-00005.safetensors",
65507
+ "model.layers.46.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
65508
+ "model.layers.46.mlp.gate.scales": "model-00004-of-00005.safetensors",
65509
  "model.layers.46.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
65510
  "model.layers.46.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
65511
  "model.layers.46.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
65550
  "model.layers.47.mlp.experts.1.up_proj.qweight": "model-00004-of-00005.safetensors",
65551
  "model.layers.47.mlp.experts.1.up_proj.qzeros": "model-00004-of-00005.safetensors",
65552
  "model.layers.47.mlp.experts.1.up_proj.scales": "model-00004-of-00005.safetensors",
65553
+ "model.layers.47.mlp.experts.10.down_proj.g_idx": "model-00004-of-00005.safetensors",
65554
+ "model.layers.47.mlp.experts.10.down_proj.qweight": "model-00004-of-00005.safetensors",
65555
+ "model.layers.47.mlp.experts.10.down_proj.qzeros": "model-00004-of-00005.safetensors",
65556
+ "model.layers.47.mlp.experts.10.down_proj.scales": "model-00004-of-00005.safetensors",
65557
+ "model.layers.47.mlp.experts.10.gate_proj.g_idx": "model-00004-of-00005.safetensors",
65558
+ "model.layers.47.mlp.experts.10.gate_proj.qweight": "model-00004-of-00005.safetensors",
65559
+ "model.layers.47.mlp.experts.10.gate_proj.qzeros": "model-00004-of-00005.safetensors",
65560
+ "model.layers.47.mlp.experts.10.gate_proj.scales": "model-00004-of-00005.safetensors",
65561
+ "model.layers.47.mlp.experts.10.up_proj.g_idx": "model-00004-of-00005.safetensors",
65562
+ "model.layers.47.mlp.experts.10.up_proj.qweight": "model-00004-of-00005.safetensors",
65563
+ "model.layers.47.mlp.experts.10.up_proj.qzeros": "model-00004-of-00005.safetensors",
65564
+ "model.layers.47.mlp.experts.10.up_proj.scales": "model-00004-of-00005.safetensors",
65565
  "model.layers.47.mlp.experts.100.down_proj.g_idx": "model-00005-of-00005.safetensors",
65566
  "model.layers.47.mlp.experts.100.down_proj.qweight": "model-00005-of-00005.safetensors",
65567
  "model.layers.47.mlp.experts.100.down_proj.qzeros": "model-00005-of-00005.safetensors",
 
65682
  "model.layers.47.mlp.experts.109.up_proj.qweight": "model-00005-of-00005.safetensors",
65683
  "model.layers.47.mlp.experts.109.up_proj.qzeros": "model-00005-of-00005.safetensors",
65684
  "model.layers.47.mlp.experts.109.up_proj.scales": "model-00005-of-00005.safetensors",
65685
+ "model.layers.47.mlp.experts.11.down_proj.g_idx": "model-00004-of-00005.safetensors",
65686
+ "model.layers.47.mlp.experts.11.down_proj.qweight": "model-00004-of-00005.safetensors",
65687
+ "model.layers.47.mlp.experts.11.down_proj.qzeros": "model-00004-of-00005.safetensors",
65688
+ "model.layers.47.mlp.experts.11.down_proj.scales": "model-00004-of-00005.safetensors",
65689
+ "model.layers.47.mlp.experts.11.gate_proj.g_idx": "model-00004-of-00005.safetensors",
65690
+ "model.layers.47.mlp.experts.11.gate_proj.qweight": "model-00004-of-00005.safetensors",
65691
+ "model.layers.47.mlp.experts.11.gate_proj.qzeros": "model-00004-of-00005.safetensors",
65692
+ "model.layers.47.mlp.experts.11.gate_proj.scales": "model-00004-of-00005.safetensors",
65693
+ "model.layers.47.mlp.experts.11.up_proj.g_idx": "model-00004-of-00005.safetensors",
65694
+ "model.layers.47.mlp.experts.11.up_proj.qweight": "model-00004-of-00005.safetensors",
65695
+ "model.layers.47.mlp.experts.11.up_proj.qzeros": "model-00004-of-00005.safetensors",
65696
+ "model.layers.47.mlp.experts.11.up_proj.scales": "model-00004-of-00005.safetensors",
65697
  "model.layers.47.mlp.experts.110.down_proj.g_idx": "model-00005-of-00005.safetensors",
65698
  "model.layers.47.mlp.experts.110.down_proj.qweight": "model-00005-of-00005.safetensors",
65699
  "model.layers.47.mlp.experts.110.down_proj.qzeros": "model-00005-of-00005.safetensors",
 
65818
  "model.layers.47.mlp.experts.12.down_proj.qweight": "model-00005-of-00005.safetensors",
65819
  "model.layers.47.mlp.experts.12.down_proj.qzeros": "model-00005-of-00005.safetensors",
65820
  "model.layers.47.mlp.experts.12.down_proj.scales": "model-00005-of-00005.safetensors",
65821
+ "model.layers.47.mlp.experts.12.gate_proj.g_idx": "model-00004-of-00005.safetensors",
65822
+ "model.layers.47.mlp.experts.12.gate_proj.qweight": "model-00004-of-00005.safetensors",
65823
+ "model.layers.47.mlp.experts.12.gate_proj.qzeros": "model-00004-of-00005.safetensors",
65824
+ "model.layers.47.mlp.experts.12.gate_proj.scales": "model-00004-of-00005.safetensors",
65825
  "model.layers.47.mlp.experts.12.up_proj.g_idx": "model-00005-of-00005.safetensors",
65826
  "model.layers.47.mlp.experts.12.up_proj.qweight": "model-00005-of-00005.safetensors",
65827
  "model.layers.47.mlp.experts.12.up_proj.qzeros": "model-00005-of-00005.safetensors",
 
66402
  "model.layers.47.mlp.experts.49.up_proj.qweight": "model-00005-of-00005.safetensors",
66403
  "model.layers.47.mlp.experts.49.up_proj.qzeros": "model-00005-of-00005.safetensors",
66404
  "model.layers.47.mlp.experts.49.up_proj.scales": "model-00005-of-00005.safetensors",
66405
+ "model.layers.47.mlp.experts.5.down_proj.g_idx": "model-00004-of-00005.safetensors",
66406
+ "model.layers.47.mlp.experts.5.down_proj.qweight": "model-00004-of-00005.safetensors",
66407
+ "model.layers.47.mlp.experts.5.down_proj.qzeros": "model-00004-of-00005.safetensors",
66408
+ "model.layers.47.mlp.experts.5.down_proj.scales": "model-00004-of-00005.safetensors",
66409
  "model.layers.47.mlp.experts.5.gate_proj.g_idx": "model-00004-of-00005.safetensors",
66410
  "model.layers.47.mlp.experts.5.gate_proj.qweight": "model-00004-of-00005.safetensors",
66411
  "model.layers.47.mlp.experts.5.gate_proj.qzeros": "model-00004-of-00005.safetensors",
66412
  "model.layers.47.mlp.experts.5.gate_proj.scales": "model-00004-of-00005.safetensors",
66413
+ "model.layers.47.mlp.experts.5.up_proj.g_idx": "model-00004-of-00005.safetensors",
66414
+ "model.layers.47.mlp.experts.5.up_proj.qweight": "model-00004-of-00005.safetensors",
66415
+ "model.layers.47.mlp.experts.5.up_proj.qzeros": "model-00004-of-00005.safetensors",
66416
+ "model.layers.47.mlp.experts.5.up_proj.scales": "model-00004-of-00005.safetensors",
66417
  "model.layers.47.mlp.experts.50.down_proj.g_idx": "model-00005-of-00005.safetensors",
66418
  "model.layers.47.mlp.experts.50.down_proj.qweight": "model-00005-of-00005.safetensors",
66419
  "model.layers.47.mlp.experts.50.down_proj.qzeros": "model-00005-of-00005.safetensors",
 
66534
  "model.layers.47.mlp.experts.59.up_proj.qweight": "model-00005-of-00005.safetensors",
66535
  "model.layers.47.mlp.experts.59.up_proj.qzeros": "model-00005-of-00005.safetensors",
66536
  "model.layers.47.mlp.experts.59.up_proj.scales": "model-00005-of-00005.safetensors",
66537
+ "model.layers.47.mlp.experts.6.down_proj.g_idx": "model-00004-of-00005.safetensors",
66538
+ "model.layers.47.mlp.experts.6.down_proj.qweight": "model-00004-of-00005.safetensors",
66539
+ "model.layers.47.mlp.experts.6.down_proj.qzeros": "model-00004-of-00005.safetensors",
66540
+ "model.layers.47.mlp.experts.6.down_proj.scales": "model-00004-of-00005.safetensors",
66541
+ "model.layers.47.mlp.experts.6.gate_proj.g_idx": "model-00004-of-00005.safetensors",
66542
+ "model.layers.47.mlp.experts.6.gate_proj.qweight": "model-00004-of-00005.safetensors",
66543
+ "model.layers.47.mlp.experts.6.gate_proj.qzeros": "model-00004-of-00005.safetensors",
66544
+ "model.layers.47.mlp.experts.6.gate_proj.scales": "model-00004-of-00005.safetensors",
66545
+ "model.layers.47.mlp.experts.6.up_proj.g_idx": "model-00004-of-00005.safetensors",
66546
+ "model.layers.47.mlp.experts.6.up_proj.qweight": "model-00004-of-00005.safetensors",
66547
+ "model.layers.47.mlp.experts.6.up_proj.qzeros": "model-00004-of-00005.safetensors",
66548
+ "model.layers.47.mlp.experts.6.up_proj.scales": "model-00004-of-00005.safetensors",
66549
  "model.layers.47.mlp.experts.60.down_proj.g_idx": "model-00005-of-00005.safetensors",
66550
  "model.layers.47.mlp.experts.60.down_proj.qweight": "model-00005-of-00005.safetensors",
66551
  "model.layers.47.mlp.experts.60.down_proj.qzeros": "model-00005-of-00005.safetensors",
 
66666
  "model.layers.47.mlp.experts.69.up_proj.qweight": "model-00005-of-00005.safetensors",
66667
  "model.layers.47.mlp.experts.69.up_proj.qzeros": "model-00005-of-00005.safetensors",
66668
  "model.layers.47.mlp.experts.69.up_proj.scales": "model-00005-of-00005.safetensors",
66669
+ "model.layers.47.mlp.experts.7.down_proj.g_idx": "model-00004-of-00005.safetensors",
66670
+ "model.layers.47.mlp.experts.7.down_proj.qweight": "model-00004-of-00005.safetensors",
66671
+ "model.layers.47.mlp.experts.7.down_proj.qzeros": "model-00004-of-00005.safetensors",
66672
+ "model.layers.47.mlp.experts.7.down_proj.scales": "model-00004-of-00005.safetensors",
66673
+ "model.layers.47.mlp.experts.7.gate_proj.g_idx": "model-00004-of-00005.safetensors",
66674
+ "model.layers.47.mlp.experts.7.gate_proj.qweight": "model-00004-of-00005.safetensors",
66675
+ "model.layers.47.mlp.experts.7.gate_proj.qzeros": "model-00004-of-00005.safetensors",
66676
+ "model.layers.47.mlp.experts.7.gate_proj.scales": "model-00004-of-00005.safetensors",
66677
+ "model.layers.47.mlp.experts.7.up_proj.g_idx": "model-00004-of-00005.safetensors",
66678
+ "model.layers.47.mlp.experts.7.up_proj.qweight": "model-00004-of-00005.safetensors",
66679
+ "model.layers.47.mlp.experts.7.up_proj.qzeros": "model-00004-of-00005.safetensors",
66680
+ "model.layers.47.mlp.experts.7.up_proj.scales": "model-00004-of-00005.safetensors",
66681
  "model.layers.47.mlp.experts.70.down_proj.g_idx": "model-00005-of-00005.safetensors",
66682
  "model.layers.47.mlp.experts.70.down_proj.qweight": "model-00005-of-00005.safetensors",
66683
  "model.layers.47.mlp.experts.70.down_proj.qzeros": "model-00005-of-00005.safetensors",
 
66798
  "model.layers.47.mlp.experts.79.up_proj.qweight": "model-00005-of-00005.safetensors",
66799
  "model.layers.47.mlp.experts.79.up_proj.qzeros": "model-00005-of-00005.safetensors",
66800
  "model.layers.47.mlp.experts.79.up_proj.scales": "model-00005-of-00005.safetensors",
66801
+ "model.layers.47.mlp.experts.8.down_proj.g_idx": "model-00004-of-00005.safetensors",
66802
+ "model.layers.47.mlp.experts.8.down_proj.qweight": "model-00004-of-00005.safetensors",
66803
+ "model.layers.47.mlp.experts.8.down_proj.qzeros": "model-00004-of-00005.safetensors",
66804
+ "model.layers.47.mlp.experts.8.down_proj.scales": "model-00004-of-00005.safetensors",
66805
+ "model.layers.47.mlp.experts.8.gate_proj.g_idx": "model-00004-of-00005.safetensors",
66806
+ "model.layers.47.mlp.experts.8.gate_proj.qweight": "model-00004-of-00005.safetensors",
66807
+ "model.layers.47.mlp.experts.8.gate_proj.qzeros": "model-00004-of-00005.safetensors",
66808
+ "model.layers.47.mlp.experts.8.gate_proj.scales": "model-00004-of-00005.safetensors",
66809
+ "model.layers.47.mlp.experts.8.up_proj.g_idx": "model-00004-of-00005.safetensors",
66810
+ "model.layers.47.mlp.experts.8.up_proj.qweight": "model-00004-of-00005.safetensors",
66811
+ "model.layers.47.mlp.experts.8.up_proj.qzeros": "model-00004-of-00005.safetensors",
66812
+ "model.layers.47.mlp.experts.8.up_proj.scales": "model-00004-of-00005.safetensors",
66813
  "model.layers.47.mlp.experts.80.down_proj.g_idx": "model-00005-of-00005.safetensors",
66814
  "model.layers.47.mlp.experts.80.down_proj.qweight": "model-00005-of-00005.safetensors",
66815
  "model.layers.47.mlp.experts.80.down_proj.qzeros": "model-00005-of-00005.safetensors",
 
66930
  "model.layers.47.mlp.experts.89.up_proj.qweight": "model-00005-of-00005.safetensors",
66931
  "model.layers.47.mlp.experts.89.up_proj.qzeros": "model-00005-of-00005.safetensors",
66932
  "model.layers.47.mlp.experts.89.up_proj.scales": "model-00005-of-00005.safetensors",
66933
+ "model.layers.47.mlp.experts.9.down_proj.g_idx": "model-00004-of-00005.safetensors",
66934
+ "model.layers.47.mlp.experts.9.down_proj.qweight": "model-00004-of-00005.safetensors",
66935
+ "model.layers.47.mlp.experts.9.down_proj.qzeros": "model-00004-of-00005.safetensors",
66936
+ "model.layers.47.mlp.experts.9.down_proj.scales": "model-00004-of-00005.safetensors",
66937
+ "model.layers.47.mlp.experts.9.gate_proj.g_idx": "model-00004-of-00005.safetensors",
66938
+ "model.layers.47.mlp.experts.9.gate_proj.qweight": "model-00004-of-00005.safetensors",
66939
+ "model.layers.47.mlp.experts.9.gate_proj.qzeros": "model-00004-of-00005.safetensors",
66940
+ "model.layers.47.mlp.experts.9.gate_proj.scales": "model-00004-of-00005.safetensors",
66941
+ "model.layers.47.mlp.experts.9.up_proj.g_idx": "model-00004-of-00005.safetensors",
66942
+ "model.layers.47.mlp.experts.9.up_proj.qweight": "model-00004-of-00005.safetensors",
66943
+ "model.layers.47.mlp.experts.9.up_proj.qzeros": "model-00004-of-00005.safetensors",
66944
+ "model.layers.47.mlp.experts.9.up_proj.scales": "model-00004-of-00005.safetensors",
66945
  "model.layers.47.mlp.experts.90.down_proj.g_idx": "model-00005-of-00005.safetensors",
66946
  "model.layers.47.mlp.experts.90.down_proj.qweight": "model-00005-of-00005.safetensors",
66947
  "model.layers.47.mlp.experts.90.down_proj.qzeros": "model-00005-of-00005.safetensors",
 
67062
  "model.layers.47.mlp.experts.99.up_proj.qweight": "model-00005-of-00005.safetensors",
67063
  "model.layers.47.mlp.experts.99.up_proj.qzeros": "model-00005-of-00005.safetensors",
67064
  "model.layers.47.mlp.experts.99.up_proj.scales": "model-00005-of-00005.safetensors",
67065
+ "model.layers.47.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
67066
+ "model.layers.47.mlp.gate.qweight": "model-00004-of-00005.safetensors",
67067
+ "model.layers.47.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
67068
+ "model.layers.47.mlp.gate.scales": "model-00004-of-00005.safetensors",
67069
  "model.layers.47.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
67070
  "model.layers.47.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
67071
  "model.layers.47.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
 
68622
  "model.layers.5.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
68623
  "model.layers.5.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
68624
  "model.layers.5.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
68625
+ "model.layers.5.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
68626
+ "model.layers.5.mlp.gate.qweight": "model-00001-of-00005.safetensors",
68627
+ "model.layers.5.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
68628
+ "model.layers.5.mlp.gate.scales": "model-00001-of-00005.safetensors",
68629
  "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
68630
  "model.layers.5.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
68631
  "model.layers.5.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
 
70182
  "model.layers.6.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
70183
  "model.layers.6.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
70184
  "model.layers.6.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
70185
+ "model.layers.6.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
70186
+ "model.layers.6.mlp.gate.qweight": "model-00001-of-00005.safetensors",
70187
+ "model.layers.6.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
70188
+ "model.layers.6.mlp.gate.scales": "model-00001-of-00005.safetensors",
70189
  "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
70190
  "model.layers.6.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
70191
  "model.layers.6.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
 
71742
  "model.layers.7.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
71743
  "model.layers.7.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
71744
  "model.layers.7.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
71745
+ "model.layers.7.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
71746
+ "model.layers.7.mlp.gate.qweight": "model-00001-of-00005.safetensors",
71747
+ "model.layers.7.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
71748
+ "model.layers.7.mlp.gate.scales": "model-00001-of-00005.safetensors",
71749
  "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
71750
  "model.layers.7.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
71751
  "model.layers.7.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
 
73302
  "model.layers.8.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
73303
  "model.layers.8.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
73304
  "model.layers.8.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
73305
+ "model.layers.8.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
73306
+ "model.layers.8.mlp.gate.qweight": "model-00001-of-00005.safetensors",
73307
+ "model.layers.8.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
73308
+ "model.layers.8.mlp.gate.scales": "model-00001-of-00005.safetensors",
73309
  "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
73310
  "model.layers.8.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
73311
  "model.layers.8.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
 
74862
  "model.layers.9.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
74863
  "model.layers.9.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
74864
  "model.layers.9.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
74865
+ "model.layers.9.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
74866
+ "model.layers.9.mlp.gate.qweight": "model-00001-of-00005.safetensors",
74867
+ "model.layers.9.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
74868
+ "model.layers.9.mlp.gate.scales": "model-00001-of-00005.safetensors",
74869
  "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
74870
  "model.layers.9.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
74871
  "model.layers.9.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
quantize_config.json CHANGED
@@ -9,13 +9,15 @@
9
  "pack_dtype": "int32",
10
  "meta": {
11
  "quantizer": [
12
- "gptqmodel:2.2.0"
13
  ],
14
  "uri": "https://github.com/modelcloud/gptqmodel",
15
- "damp_percent": 0.01,
16
- "damp_auto_increment": 0.0025,
17
  "static_groups": false,
18
  "true_sequential": true,
19
- "mse": 0.0
 
 
20
  }
21
  }
 
9
  "pack_dtype": "int32",
10
  "meta": {
11
  "quantizer": [
12
+ "gptqmodel:4.0.0-dev"
13
  ],
14
  "uri": "https://github.com/modelcloud/gptqmodel",
15
+ "damp_percent": 0.05,
16
+ "damp_auto_increment": 0.01,
17
  "static_groups": false,
18
  "true_sequential": true,
19
+ "mse": 0.0,
20
+ "v2": false,
21
+ "v2_alpha": 0.25
22
  }
23
  }