Upload folder using huggingface_hub
Browse files- .msc +0 -0
- .mv +1 -1
- README.md +20 -339
- config.json +6 -4
- model-00001-of-00005.safetensors +2 -2
- model-00002-of-00005.safetensors +2 -2
- model-00003-of-00005.safetensors +2 -2
- model-00004-of-00005.safetensors +2 -2
- model-00005-of-00005.safetensors +2 -2
- model.safetensors.index.json +397 -253
- quantize_config.json +6 -4
.msc
CHANGED
Binary files a/.msc and b/.msc differ
|
|
.mv
CHANGED
@@ -1 +1 @@
|
|
1 |
-
Revision:master,CreatedAt:
|
|
|
1 |
+
Revision:master,CreatedAt:1746673573
|
README.md
CHANGED
@@ -1,354 +1,35 @@
|
|
1 |
-
---
|
2 |
-
library_name: transformers
|
3 |
license: apache-2.0
|
4 |
-
license_link: https://huggingface.co/Qwen/Qwen3-4B/blob/main/LICENSE
|
5 |
-
pipeline_tag: text-generation
|
6 |
-
base_model:
|
7 |
-
- Qwen/Qwen3-4B-Base
|
8 |
-
---
|
9 |
|
10 |
-
# Qwen3-
|
11 |
-
|
12 |
-
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
|
13 |
-
</a>
|
14 |
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
- **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
|
20 |
-
- **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
|
21 |
-
- **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.
|
22 |
-
- **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
|
23 |
-
- **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**.
|
24 |
-
|
25 |
-
## Model Overview
|
26 |
-
|
27 |
-
**Qwen3-4B** has the following features:
|
28 |
-
- Type: Causal Language Models
|
29 |
-
- Training Stage: Pretraining & Post-training
|
30 |
-
- Number of Parameters: 4.0B
|
31 |
-
- Number of Paramaters (Non-Embedding): 3.6B
|
32 |
-
- Number of Layers: 36
|
33 |
-
- Number of Attention Heads (GQA): 32 for Q and 8 for KV
|
34 |
-
- Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts).
|
35 |
-
|
36 |
-
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
|
37 |
-
|
38 |
-
> [!TIP]
|
39 |
-
> If you encounter significant endless repetitions, please refer to the [Best Practices](#best-practices) section for optimal sampling parameters, and set the ``presence_penalty`` to 1.5.
|
40 |
-
|
41 |
-
## Quickstart
|
42 |
-
|
43 |
-
The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
|
44 |
-
|
45 |
-
With `transformers<4.51.0`, you will encounter the following error:
|
46 |
-
```
|
47 |
-
KeyError: 'qwen3'
|
48 |
```
|
49 |
|
50 |
-
|
51 |
-
```python
|
52 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
53 |
-
|
54 |
-
model_name = "Qwen/Qwen3-4B"
|
55 |
-
|
56 |
-
# load the tokenizer and the model
|
57 |
-
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
58 |
-
model = AutoModelForCausalLM.from_pretrained(
|
59 |
-
model_name,
|
60 |
-
torch_dtype="auto",
|
61 |
-
device_map="auto"
|
62 |
-
)
|
63 |
-
|
64 |
-
# prepare the model input
|
65 |
-
prompt = "Give me a short introduction to large language model."
|
66 |
-
messages = [
|
67 |
-
{"role": "user", "content": prompt}
|
68 |
-
]
|
69 |
-
text = tokenizer.apply_chat_template(
|
70 |
-
messages,
|
71 |
-
tokenize=False,
|
72 |
-
add_generation_prompt=True,
|
73 |
-
enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
|
74 |
-
)
|
75 |
-
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
76 |
-
|
77 |
-
# conduct text completion
|
78 |
-
generated_ids = model.generate(
|
79 |
-
**model_inputs,
|
80 |
-
max_new_tokens=32768
|
81 |
-
)
|
82 |
-
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
|
83 |
|
84 |
-
# parsing thinking content
|
85 |
-
try:
|
86 |
-
# rindex finding 151668 (</think>)
|
87 |
-
index = len(output_ids) - output_ids[::-1].index(151668)
|
88 |
-
except ValueError:
|
89 |
-
index = 0
|
90 |
-
|
91 |
-
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
|
92 |
-
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
|
93 |
-
|
94 |
-
print("thinking content:", thinking_content)
|
95 |
-
print("content:", content)
|
96 |
```
|
97 |
-
|
98 |
-
For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
|
99 |
-
- SGLang:
|
100 |
-
```shell
|
101 |
-
python -m sglang.launch_server --model-path Qwen/Qwen3-4B --reasoning-parser qwen3
|
102 |
-
```
|
103 |
-
- vLLM:
|
104 |
-
```shell
|
105 |
-
vllm serve Qwen/Qwen3-4B --enable-reasoning --reasoning-parser deepseek_r1
|
106 |
-
```
|
107 |
-
|
108 |
-
For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
|
109 |
-
|
110 |
-
## Switching Between Thinking and Non-Thinking Mode
|
111 |
-
|
112 |
-
> [!TIP]
|
113 |
-
> The `enable_thinking` switch is also available in APIs created by SGLang and vLLM.
|
114 |
-
> Please refer to our documentation for [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) and [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) users.
|
115 |
-
|
116 |
-
### `enable_thinking=True`
|
117 |
-
|
118 |
-
By default, Qwen3 has thinking capabilities enabled, similar to QwQ-32B. This means the model will use its reasoning abilities to enhance the quality of generated responses. For example, when explicitly setting `enable_thinking=True` or leaving it as the default value in `tokenizer.apply_chat_template`, the model will engage its thinking mode.
|
119 |
-
|
120 |
-
```python
|
121 |
-
text = tokenizer.apply_chat_template(
|
122 |
-
messages,
|
123 |
-
tokenize=False,
|
124 |
-
add_generation_prompt=True,
|
125 |
-
enable_thinking=True # True is the default value for enable_thinking
|
126 |
-
)
|
127 |
```
|
128 |
|
129 |
-
In this mode, the model will generate think content wrapped in a `<think>...</think>` block, followed by the final response.
|
130 |
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
### `enable_thinking=False`
|
136 |
-
|
137 |
-
We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency.
|
138 |
-
|
139 |
-
```python
|
140 |
-
text = tokenizer.apply_chat_template(
|
141 |
-
messages,
|
142 |
-
tokenize=False,
|
143 |
-
add_generation_prompt=True,
|
144 |
-
enable_thinking=False # Setting enable_thinking=False disables thinking mode
|
145 |
-
)
|
146 |
```
|
147 |
-
|
148 |
-
In this mode, the model will not generate any think content and will not include a `<think>...</think>` block.
|
149 |
-
|
150 |
-
> [!NOTE]
|
151 |
-
> For non-thinking mode, we suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`. For more detailed guidance, please refer to the [Best Practices](#best-practices) section.
|
152 |
-
|
153 |
-
### Advanced Usage: Switching Between Thinking and Non-Thinking Modes via User Input
|
154 |
-
|
155 |
-
We provide a soft switch mechanism that allows users to dynamically control the model's behavior when `enable_thinking=True`. Specifically, you can add `/think` and `/no_think` to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.
|
156 |
-
|
157 |
-
Here is an example of a multi-turn conversation:
|
158 |
-
|
159 |
```python
|
160 |
-
|
161 |
-
|
162 |
-
|
163 |
-
def __init__(self, model_name="Qwen/Qwen3-4B"):
|
164 |
-
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
|
165 |
-
self.model = AutoModelForCausalLM.from_pretrained(model_name)
|
166 |
-
self.history = []
|
167 |
-
|
168 |
-
def generate_response(self, user_input):
|
169 |
-
messages = self.history + [{"role": "user", "content": user_input}]
|
170 |
-
|
171 |
-
text = self.tokenizer.apply_chat_template(
|
172 |
-
messages,
|
173 |
-
tokenize=False,
|
174 |
-
add_generation_prompt=True
|
175 |
-
)
|
176 |
-
|
177 |
-
inputs = self.tokenizer(text, return_tensors="pt")
|
178 |
-
response_ids = self.model.generate(**inputs, max_new_tokens=32768)[0][len(inputs.input_ids[0]):].tolist()
|
179 |
-
response = self.tokenizer.decode(response_ids, skip_special_tokens=True)
|
180 |
-
|
181 |
-
# Update history
|
182 |
-
self.history.append({"role": "user", "content": user_input})
|
183 |
-
self.history.append({"role": "assistant", "content": response})
|
184 |
-
|
185 |
-
return response
|
186 |
-
|
187 |
-
# Example Usage
|
188 |
-
if __name__ == "__main__":
|
189 |
-
chatbot = QwenChatbot()
|
190 |
-
|
191 |
-
# First input (without /think or /no_think tags, thinking mode is enabled by default)
|
192 |
-
user_input_1 = "How many r's in strawberries?"
|
193 |
-
print(f"User: {user_input_1}")
|
194 |
-
response_1 = chatbot.generate_response(user_input_1)
|
195 |
-
print(f"Bot: {response_1}")
|
196 |
-
print("----------------------")
|
197 |
-
|
198 |
-
# Second input with /no_think
|
199 |
-
user_input_2 = "Then, how many r's in blueberries? /no_think"
|
200 |
-
print(f"User: {user_input_2}")
|
201 |
-
response_2 = chatbot.generate_response(user_input_2)
|
202 |
-
print(f"Bot: {response_2}")
|
203 |
-
print("----------------------")
|
204 |
-
|
205 |
-
# Third input with /think
|
206 |
-
user_input_3 = "Really? /think"
|
207 |
-
print(f"User: {user_input_3}")
|
208 |
-
response_3 = chatbot.generate_response(user_input_3)
|
209 |
-
print(f"Bot: {response_3}")
|
210 |
```
|
211 |
-
|
212 |
-
> [!NOTE]
|
213 |
-
> For API compatibility, when `enable_thinking=True`, regardless of whether the user uses `/think` or `/no_think`, the model will always output a block wrapped in `<think>...</think>`. However, the content inside this block may be empty if thinking is disabled.
|
214 |
-
> When `enable_thinking=False`, the soft switches are not valid. Regardless of any `/think` or `/no_think` tags input by the user, the model will not generate think content and will not include a `<think>...</think>` block.
|
215 |
-
|
216 |
-
## Agentic Use
|
217 |
-
|
218 |
-
Qwen3 excels in tool calling capabilities. We recommend using [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity.
|
219 |
-
|
220 |
-
To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
|
221 |
-
```python
|
222 |
-
from qwen_agent.agents import Assistant
|
223 |
-
|
224 |
-
# Define LLM
|
225 |
-
llm_cfg = {
|
226 |
-
'model': 'Qwen3-4B',
|
227 |
-
|
228 |
-
# Use the endpoint provided by Alibaba Model Studio:
|
229 |
-
# 'model_type': 'qwen_dashscope',
|
230 |
-
# 'api_key': os.getenv('DASHSCOPE_API_KEY'),
|
231 |
-
|
232 |
-
# Use a custom endpoint compatible with OpenAI API:
|
233 |
-
'model_server': 'http://localhost:8000/v1', # api_base
|
234 |
-
'api_key': 'EMPTY',
|
235 |
-
|
236 |
-
# Other parameters:
|
237 |
-
# 'generate_cfg': {
|
238 |
-
# # Add: When the response content is `<think>this is the thought</think>this is the answer;
|
239 |
-
# # Do not add: When the response has been separated by reasoning_content and content.
|
240 |
-
# 'thought_in_content': True,
|
241 |
-
# },
|
242 |
-
}
|
243 |
-
|
244 |
-
# Define Tools
|
245 |
-
tools = [
|
246 |
-
{'mcpServers': { # You can specify the MCP configuration file
|
247 |
-
'time': {
|
248 |
-
'command': 'uvx',
|
249 |
-
'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai']
|
250 |
-
},
|
251 |
-
"fetch": {
|
252 |
-
"command": "uvx",
|
253 |
-
"args": ["mcp-server-fetch"]
|
254 |
-
}
|
255 |
-
}
|
256 |
-
},
|
257 |
-
'code_interpreter', # Built-in tools
|
258 |
-
]
|
259 |
-
|
260 |
-
# Define Agent
|
261 |
-
bot = Assistant(llm=llm_cfg, function_list=tools)
|
262 |
-
|
263 |
-
# Streaming generation
|
264 |
-
messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}]
|
265 |
-
for responses in bot.run(messages=messages):
|
266 |
-
pass
|
267 |
-
print(responses)
|
268 |
```
|
269 |
-
|
270 |
-
|
271 |
-
|
272 |
-
Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the [YaRN](https://arxiv.org/abs/2309.00071) method.
|
273 |
-
|
274 |
-
YaRN is currently supported by several inference frameworks, e.g., `transformers` and `llama.cpp` for local use, `vllm` and `sglang` for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:
|
275 |
-
|
276 |
-
- Modifying the model files:
|
277 |
-
In the `config.json` file, add the `rope_scaling` fields:
|
278 |
-
```json
|
279 |
-
{
|
280 |
-
...,
|
281 |
-
"rope_scaling": {
|
282 |
-
"rope_type": "yarn",
|
283 |
-
"factor": 4.0,
|
284 |
-
"original_max_position_embeddings": 32768
|
285 |
-
}
|
286 |
-
}
|
287 |
-
```
|
288 |
-
For `llama.cpp`, you need to regenerate the GGUF file after the modification.
|
289 |
-
|
290 |
-
- Passing command line arguments:
|
291 |
-
|
292 |
-
For `vllm`, you can use
|
293 |
-
```shell
|
294 |
-
vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
|
295 |
-
```
|
296 |
-
|
297 |
-
For `sglang`, you can use
|
298 |
-
```shell
|
299 |
-
python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
|
300 |
-
```
|
301 |
-
|
302 |
-
For `llama-server` from `llama.cpp`, you can use
|
303 |
-
```shell
|
304 |
-
llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
|
305 |
-
```
|
306 |
-
|
307 |
-
> [!IMPORTANT]
|
308 |
-
> If you encounter the following warning
|
309 |
-
> ```
|
310 |
-
> Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'original_max_position_embeddings'}
|
311 |
-
> ```
|
312 |
-
> please upgrade `transformers>=4.51.0`.
|
313 |
-
|
314 |
-
> [!NOTE]
|
315 |
-
> All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts.**
|
316 |
-
> We advise adding the `rope_scaling` configuration only when processing long contexts is required.
|
317 |
-
> It is also recommended to modify the `factor` as needed. For example, if the typical context length for your application is 65,536 tokens, it would be better to set `factor` as 2.0.
|
318 |
-
|
319 |
-
> [!NOTE]
|
320 |
-
> The default `max_position_embeddings` in `config.json` is set to 40,960. This allocation includes reserving 32,768 tokens for outputs and 8,192 tokens for typical prompts, which is sufficient for most scenarios involving short text processing. If the average context length does not exceed 32,768 tokens, we do not recommend enabling YaRN in this scenario, as it may potentially degrade model performance.
|
321 |
-
|
322 |
-
> [!TIP]
|
323 |
-
> The endpoint provided by Alibaba Model Studio supports dynamic YaRN by default and no extra configuration is needed.
|
324 |
-
|
325 |
-
## Best Practices
|
326 |
-
|
327 |
-
To achieve optimal performance, we recommend the following settings:
|
328 |
-
|
329 |
-
1. **Sampling Parameters**:
|
330 |
-
- For thinking mode (`enable_thinking=True`), use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0`. **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions.
|
331 |
-
- For non-thinking mode (`enable_thinking=False`), we suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`.
|
332 |
-
- For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
|
333 |
-
|
334 |
-
2. **Adequate Output Length**: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 38,912 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance.
|
335 |
-
|
336 |
-
3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
|
337 |
-
- **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
|
338 |
-
- **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g., `"answer": "C"`."
|
339 |
-
|
340 |
-
4. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed.
|
341 |
-
|
342 |
-
### Citation
|
343 |
-
|
344 |
-
If you find our work helpful, feel free to give us a cite.
|
345 |
-
|
346 |
```
|
347 |
-
|
348 |
-
|
349 |
-
url = {https://qwenlm.github.io/blog/qwen3/},
|
350 |
-
author = {Qwen Team},
|
351 |
-
month = {April},
|
352 |
-
year = {2025}
|
353 |
-
}
|
354 |
-
```
|
|
|
|
|
|
|
1 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
+
# 通义千问Qwen3-30B-A3B-GPTQ-Int4量化
|
4 |
+
基础模型 [通义千问3-30B-A3B](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B)
|
|
|
|
|
5 |
|
6 |
+
### 最近更新
|
7 |
+
```
|
8 |
+
2025-05-08
|
9 |
+
fix (model.layers.*.mlp.gate) are not quantized
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
```
|
11 |
|
12 |
+
### 依赖
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
```
|
15 |
+
vllm==0.8.5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
```
|
17 |
|
|
|
18 |
|
19 |
+
SDK下载
|
20 |
+
```bash
|
21 |
+
#安装ModelScope
|
22 |
+
pip install modelscope
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
```python
|
25 |
+
#SDK模型下载
|
26 |
+
from modelscope import snapshot_download
|
27 |
+
model_dir = snapshot_download('JunHowie/Qwen3-30B-A3B-GPTQ-Int4')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
```
|
29 |
+
Git下载
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
```
|
31 |
+
#Git模型下载
|
32 |
+
git clone https://www.modelscope.cn/JunHowie/Qwen3-30B-A3B-GPTQ-Int4.git
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
```
|
34 |
+
|
35 |
+
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
@@ -31,15 +31,17 @@
|
|
31 |
"group_size": 128,
|
32 |
"lm_head": false,
|
33 |
"meta": {
|
34 |
-
"damp_auto_increment": 0.
|
35 |
-
"damp_percent": 0.
|
36 |
"mse": 0.0,
|
37 |
"quantizer": [
|
38 |
-
"gptqmodel:
|
39 |
],
|
40 |
"static_groups": false,
|
41 |
"true_sequential": true,
|
42 |
-
"uri": "https://github.com/modelcloud/gptqmodel"
|
|
|
|
|
43 |
},
|
44 |
"pack_dtype": "int32",
|
45 |
"quant_method": "gptq",
|
|
|
31 |
"group_size": 128,
|
32 |
"lm_head": false,
|
33 |
"meta": {
|
34 |
+
"damp_auto_increment": 0.01,
|
35 |
+
"damp_percent": 0.05,
|
36 |
"mse": 0.0,
|
37 |
"quantizer": [
|
38 |
+
"gptqmodel:4.0.0-dev"
|
39 |
],
|
40 |
"static_groups": false,
|
41 |
"true_sequential": true,
|
42 |
+
"uri": "https://github.com/modelcloud/gptqmodel",
|
43 |
+
"v2": false,
|
44 |
+
"v2_alpha": 0.25
|
45 |
},
|
46 |
"pack_dtype": "int32",
|
47 |
"quant_method": "gptq",
|
model-00001-of-00005.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:15e5be8644a3e4c3ad1f05b48d6caf6c57d09772085dc9142a5560c4d2b7b242
|
3 |
+
size 4001615168
|
model-00002-of-00005.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:621653d6bc683384afa8df31474c531a31012df22f9b33ebcb5cc1335dd9d070
|
3 |
+
size 4001632008
|
model-00003-of-00005.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:84eaf9be046f6dd2a8c73b8d806e3cb7c3b0f7f1eac788f9c7922aa8ef31a0bb
|
3 |
+
size 4001632136
|
model-00004-of-00005.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9103e328ced3707a04fd14fdacf59c12d58e512f8a1004310b342787ca52ae55
|
3 |
+
size 4001745272
|
model-00005-of-00005.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:504462bea741d0e85d5cfc85989bb2f20eac5671caca96d5fa537d39bbc94de3
|
3 |
+
size 908307360
|
model.safetensors.index.json
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
{
|
2 |
"metadata": {
|
3 |
-
"total_size":
|
4 |
},
|
5 |
"weight_map": {
|
6 |
"lm_head.weight": "model-00005-of-00005.safetensors",
|
@@ -1542,7 +1542,10 @@
|
|
1542 |
"model.layers.0.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
1543 |
"model.layers.0.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
1544 |
"model.layers.0.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
1545 |
-
"model.layers.0.mlp.gate.
|
|
|
|
|
|
|
1546 |
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
1547 |
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
1548 |
"model.layers.0.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
@@ -3099,7 +3102,10 @@
|
|
3099 |
"model.layers.1.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
3100 |
"model.layers.1.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
3101 |
"model.layers.1.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
3102 |
-
"model.layers.1.mlp.gate.
|
|
|
|
|
|
|
3103 |
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
3104 |
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
3105 |
"model.layers.1.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
@@ -3876,10 +3882,10 @@
|
|
3876 |
"model.layers.10.mlp.experts.4.up_proj.qweight": "model-00001-of-00005.safetensors",
|
3877 |
"model.layers.10.mlp.experts.4.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
3878 |
"model.layers.10.mlp.experts.4.up_proj.scales": "model-00001-of-00005.safetensors",
|
3879 |
-
"model.layers.10.mlp.experts.40.down_proj.g_idx": "model-
|
3880 |
-
"model.layers.10.mlp.experts.40.down_proj.qweight": "model-
|
3881 |
-
"model.layers.10.mlp.experts.40.down_proj.qzeros": "model-
|
3882 |
-
"model.layers.10.mlp.experts.40.down_proj.scales": "model-
|
3883 |
"model.layers.10.mlp.experts.40.gate_proj.g_idx": "model-00001-of-00005.safetensors",
|
3884 |
"model.layers.10.mlp.experts.40.gate_proj.qweight": "model-00001-of-00005.safetensors",
|
3885 |
"model.layers.10.mlp.experts.40.gate_proj.qzeros": "model-00001-of-00005.safetensors",
|
@@ -3888,26 +3894,26 @@
|
|
3888 |
"model.layers.10.mlp.experts.40.up_proj.qweight": "model-00001-of-00005.safetensors",
|
3889 |
"model.layers.10.mlp.experts.40.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
3890 |
"model.layers.10.mlp.experts.40.up_proj.scales": "model-00001-of-00005.safetensors",
|
3891 |
-
"model.layers.10.mlp.experts.41.down_proj.g_idx": "model-
|
3892 |
-
"model.layers.10.mlp.experts.41.down_proj.qweight": "model-
|
3893 |
-
"model.layers.10.mlp.experts.41.down_proj.qzeros": "model-
|
3894 |
-
"model.layers.10.mlp.experts.41.down_proj.scales": "model-
|
3895 |
-
"model.layers.10.mlp.experts.41.gate_proj.g_idx": "model-
|
3896 |
-
"model.layers.10.mlp.experts.41.gate_proj.qweight": "model-
|
3897 |
-
"model.layers.10.mlp.experts.41.gate_proj.qzeros": "model-
|
3898 |
-
"model.layers.10.mlp.experts.41.gate_proj.scales": "model-
|
3899 |
-
"model.layers.10.mlp.experts.41.up_proj.g_idx": "model-
|
3900 |
-
"model.layers.10.mlp.experts.41.up_proj.qweight": "model-
|
3901 |
-
"model.layers.10.mlp.experts.41.up_proj.qzeros": "model-
|
3902 |
-
"model.layers.10.mlp.experts.41.up_proj.scales": "model-
|
3903 |
"model.layers.10.mlp.experts.42.down_proj.g_idx": "model-00002-of-00005.safetensors",
|
3904 |
"model.layers.10.mlp.experts.42.down_proj.qweight": "model-00002-of-00005.safetensors",
|
3905 |
"model.layers.10.mlp.experts.42.down_proj.qzeros": "model-00002-of-00005.safetensors",
|
3906 |
"model.layers.10.mlp.experts.42.down_proj.scales": "model-00002-of-00005.safetensors",
|
3907 |
-
"model.layers.10.mlp.experts.42.gate_proj.g_idx": "model-
|
3908 |
-
"model.layers.10.mlp.experts.42.gate_proj.qweight": "model-
|
3909 |
-
"model.layers.10.mlp.experts.42.gate_proj.qzeros": "model-
|
3910 |
-
"model.layers.10.mlp.experts.42.gate_proj.scales": "model-
|
3911 |
"model.layers.10.mlp.experts.42.up_proj.g_idx": "model-00002-of-00005.safetensors",
|
3912 |
"model.layers.10.mlp.experts.42.up_proj.qweight": "model-00002-of-00005.safetensors",
|
3913 |
"model.layers.10.mlp.experts.42.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
@@ -4656,7 +4662,10 @@
|
|
4656 |
"model.layers.10.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
4657 |
"model.layers.10.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
4658 |
"model.layers.10.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
4659 |
-
"model.layers.10.mlp.gate.
|
|
|
|
|
|
|
4660 |
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
4661 |
"model.layers.10.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
4662 |
"model.layers.10.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
@@ -6213,7 +6222,10 @@
|
|
6213 |
"model.layers.11.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
6214 |
"model.layers.11.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
6215 |
"model.layers.11.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
6216 |
-
"model.layers.11.mlp.gate.
|
|
|
|
|
|
|
6217 |
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
6218 |
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
6219 |
"model.layers.11.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
@@ -7770,7 +7782,10 @@
|
|
7770 |
"model.layers.12.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
7771 |
"model.layers.12.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
7772 |
"model.layers.12.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
7773 |
-
"model.layers.12.mlp.gate.
|
|
|
|
|
|
|
7774 |
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
7775 |
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
7776 |
"model.layers.12.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
@@ -9327,7 +9342,10 @@
|
|
9327 |
"model.layers.13.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
9328 |
"model.layers.13.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
9329 |
"model.layers.13.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
9330 |
-
"model.layers.13.mlp.gate.
|
|
|
|
|
|
|
9331 |
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
9332 |
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
9333 |
"model.layers.13.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
@@ -10884,7 +10902,10 @@
|
|
10884 |
"model.layers.14.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
10885 |
"model.layers.14.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
10886 |
"model.layers.14.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
10887 |
-
"model.layers.14.mlp.gate.
|
|
|
|
|
|
|
10888 |
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
10889 |
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
10890 |
"model.layers.14.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
@@ -12441,7 +12462,10 @@
|
|
12441 |
"model.layers.15.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
12442 |
"model.layers.15.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
12443 |
"model.layers.15.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
12444 |
-
"model.layers.15.mlp.gate.
|
|
|
|
|
|
|
12445 |
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
12446 |
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
12447 |
"model.layers.15.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
@@ -13998,7 +14022,10 @@
|
|
13998 |
"model.layers.16.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
13999 |
"model.layers.16.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
14000 |
"model.layers.16.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
14001 |
-
"model.layers.16.mlp.gate.
|
|
|
|
|
|
|
14002 |
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
14003 |
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
14004 |
"model.layers.16.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
@@ -15555,7 +15582,10 @@
|
|
15555 |
"model.layers.17.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
15556 |
"model.layers.17.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
15557 |
"model.layers.17.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
15558 |
-
"model.layers.17.mlp.gate.
|
|
|
|
|
|
|
15559 |
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
15560 |
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
15561 |
"model.layers.17.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
@@ -17112,7 +17142,10 @@
|
|
17112 |
"model.layers.18.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
17113 |
"model.layers.18.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
17114 |
"model.layers.18.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
17115 |
-
"model.layers.18.mlp.gate.
|
|
|
|
|
|
|
17116 |
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
17117 |
"model.layers.18.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
17118 |
"model.layers.18.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
@@ -18669,7 +18702,10 @@
|
|
18669 |
"model.layers.19.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
18670 |
"model.layers.19.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
18671 |
"model.layers.19.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
18672 |
-
"model.layers.19.mlp.gate.
|
|
|
|
|
|
|
18673 |
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
18674 |
"model.layers.19.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
18675 |
"model.layers.19.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
@@ -20226,7 +20262,10 @@
|
|
20226 |
"model.layers.2.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
20227 |
"model.layers.2.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
20228 |
"model.layers.2.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
20229 |
-
"model.layers.2.mlp.gate.
|
|
|
|
|
|
|
20230 |
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
20231 |
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
20232 |
"model.layers.2.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
@@ -21783,7 +21822,10 @@
|
|
21783 |
"model.layers.20.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
21784 |
"model.layers.20.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
21785 |
"model.layers.20.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
21786 |
-
"model.layers.20.mlp.gate.
|
|
|
|
|
|
|
21787 |
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
21788 |
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
21789 |
"model.layers.20.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
@@ -23340,7 +23382,10 @@
|
|
23340 |
"model.layers.21.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
23341 |
"model.layers.21.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
23342 |
"model.layers.21.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
23343 |
-
"model.layers.21.mlp.gate.
|
|
|
|
|
|
|
23344 |
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
23345 |
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
23346 |
"model.layers.21.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
@@ -24549,50 +24594,50 @@
|
|
24549 |
"model.layers.22.mlp.experts.72.up_proj.qweight": "model-00002-of-00005.safetensors",
|
24550 |
"model.layers.22.mlp.experts.72.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
24551 |
"model.layers.22.mlp.experts.72.up_proj.scales": "model-00002-of-00005.safetensors",
|
24552 |
-
"model.layers.22.mlp.experts.73.down_proj.g_idx": "model-
|
24553 |
-
"model.layers.22.mlp.experts.73.down_proj.qweight": "model-
|
24554 |
-
"model.layers.22.mlp.experts.73.down_proj.qzeros": "model-
|
24555 |
-
"model.layers.22.mlp.experts.73.down_proj.scales": "model-
|
24556 |
-
"model.layers.22.mlp.experts.73.gate_proj.g_idx": "model-
|
24557 |
-
"model.layers.22.mlp.experts.73.gate_proj.qweight": "model-
|
24558 |
-
"model.layers.22.mlp.experts.73.gate_proj.qzeros": "model-
|
24559 |
-
"model.layers.22.mlp.experts.73.gate_proj.scales": "model-
|
24560 |
-
"model.layers.22.mlp.experts.73.up_proj.g_idx": "model-
|
24561 |
-
"model.layers.22.mlp.experts.73.up_proj.qweight": "model-
|
24562 |
-
"model.layers.22.mlp.experts.73.up_proj.qzeros": "model-
|
24563 |
-
"model.layers.22.mlp.experts.73.up_proj.scales": "model-
|
24564 |
-
"model.layers.22.mlp.experts.74.down_proj.g_idx": "model-
|
24565 |
-
"model.layers.22.mlp.experts.74.down_proj.qweight": "model-
|
24566 |
-
"model.layers.22.mlp.experts.74.down_proj.qzeros": "model-
|
24567 |
-
"model.layers.22.mlp.experts.74.down_proj.scales": "model-
|
24568 |
-
"model.layers.22.mlp.experts.74.gate_proj.g_idx": "model-
|
24569 |
-
"model.layers.22.mlp.experts.74.gate_proj.qweight": "model-
|
24570 |
-
"model.layers.22.mlp.experts.74.gate_proj.qzeros": "model-
|
24571 |
-
"model.layers.22.mlp.experts.74.gate_proj.scales": "model-
|
24572 |
-
"model.layers.22.mlp.experts.74.up_proj.g_idx": "model-
|
24573 |
-
"model.layers.22.mlp.experts.74.up_proj.qweight": "model-
|
24574 |
-
"model.layers.22.mlp.experts.74.up_proj.qzeros": "model-
|
24575 |
-
"model.layers.22.mlp.experts.74.up_proj.scales": "model-
|
24576 |
-
"model.layers.22.mlp.experts.75.down_proj.g_idx": "model-
|
24577 |
-
"model.layers.22.mlp.experts.75.down_proj.qweight": "model-
|
24578 |
-
"model.layers.22.mlp.experts.75.down_proj.qzeros": "model-
|
24579 |
-
"model.layers.22.mlp.experts.75.down_proj.scales": "model-
|
24580 |
-
"model.layers.22.mlp.experts.75.gate_proj.g_idx": "model-
|
24581 |
-
"model.layers.22.mlp.experts.75.gate_proj.qweight": "model-
|
24582 |
-
"model.layers.22.mlp.experts.75.gate_proj.qzeros": "model-
|
24583 |
-
"model.layers.22.mlp.experts.75.gate_proj.scales": "model-
|
24584 |
-
"model.layers.22.mlp.experts.75.up_proj.g_idx": "model-
|
24585 |
-
"model.layers.22.mlp.experts.75.up_proj.qweight": "model-
|
24586 |
-
"model.layers.22.mlp.experts.75.up_proj.qzeros": "model-
|
24587 |
-
"model.layers.22.mlp.experts.75.up_proj.scales": "model-
|
24588 |
"model.layers.22.mlp.experts.76.down_proj.g_idx": "model-00003-of-00005.safetensors",
|
24589 |
"model.layers.22.mlp.experts.76.down_proj.qweight": "model-00003-of-00005.safetensors",
|
24590 |
"model.layers.22.mlp.experts.76.down_proj.qzeros": "model-00003-of-00005.safetensors",
|
24591 |
"model.layers.22.mlp.experts.76.down_proj.scales": "model-00003-of-00005.safetensors",
|
24592 |
-
"model.layers.22.mlp.experts.76.gate_proj.g_idx": "model-
|
24593 |
-
"model.layers.22.mlp.experts.76.gate_proj.qweight": "model-
|
24594 |
-
"model.layers.22.mlp.experts.76.gate_proj.qzeros": "model-
|
24595 |
-
"model.layers.22.mlp.experts.76.gate_proj.scales": "model-
|
24596 |
"model.layers.22.mlp.experts.76.up_proj.g_idx": "model-00003-of-00005.safetensors",
|
24597 |
"model.layers.22.mlp.experts.76.up_proj.qweight": "model-00003-of-00005.safetensors",
|
24598 |
"model.layers.22.mlp.experts.76.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
@@ -24897,7 +24942,10 @@
|
|
24897 |
"model.layers.22.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
24898 |
"model.layers.22.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
24899 |
"model.layers.22.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
24900 |
-
"model.layers.22.mlp.gate.
|
|
|
|
|
|
|
24901 |
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
24902 |
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
24903 |
"model.layers.22.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
@@ -26454,7 +26502,10 @@
|
|
26454 |
"model.layers.23.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
26455 |
"model.layers.23.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
26456 |
"model.layers.23.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
26457 |
-
"model.layers.23.mlp.gate.
|
|
|
|
|
|
|
26458 |
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
26459 |
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
26460 |
"model.layers.23.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
@@ -28011,7 +28062,10 @@
|
|
28011 |
"model.layers.24.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
28012 |
"model.layers.24.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
28013 |
"model.layers.24.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
28014 |
-
"model.layers.24.mlp.gate.
|
|
|
|
|
|
|
28015 |
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
28016 |
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
28017 |
"model.layers.24.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
@@ -29568,7 +29622,10 @@
|
|
29568 |
"model.layers.25.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
29569 |
"model.layers.25.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
29570 |
"model.layers.25.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
29571 |
-
"model.layers.25.mlp.gate.
|
|
|
|
|
|
|
29572 |
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
29573 |
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
29574 |
"model.layers.25.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
@@ -31125,7 +31182,10 @@
|
|
31125 |
"model.layers.26.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
31126 |
"model.layers.26.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
31127 |
"model.layers.26.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
31128 |
-
"model.layers.26.mlp.gate.
|
|
|
|
|
|
|
31129 |
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
31130 |
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
31131 |
"model.layers.26.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
@@ -32682,7 +32742,10 @@
|
|
32682 |
"model.layers.27.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
32683 |
"model.layers.27.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
32684 |
"model.layers.27.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
32685 |
-
"model.layers.27.mlp.gate.
|
|
|
|
|
|
|
32686 |
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
32687 |
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
32688 |
"model.layers.27.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
@@ -34239,7 +34302,10 @@
|
|
34239 |
"model.layers.28.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
34240 |
"model.layers.28.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
34241 |
"model.layers.28.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
34242 |
-
"model.layers.28.mlp.gate.
|
|
|
|
|
|
|
34243 |
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
34244 |
"model.layers.28.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
34245 |
"model.layers.28.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
@@ -35796,7 +35862,10 @@
|
|
35796 |
"model.layers.29.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
35797 |
"model.layers.29.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
35798 |
"model.layers.29.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
35799 |
-
"model.layers.29.mlp.gate.
|
|
|
|
|
|
|
35800 |
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
35801 |
"model.layers.29.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
35802 |
"model.layers.29.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
@@ -37353,7 +37422,10 @@
|
|
37353 |
"model.layers.3.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
37354 |
"model.layers.3.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
37355 |
"model.layers.3.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
37356 |
-
"model.layers.3.mlp.gate.
|
|
|
|
|
|
|
37357 |
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
37358 |
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
37359 |
"model.layers.3.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
@@ -38910,7 +38982,10 @@
|
|
38910 |
"model.layers.30.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
38911 |
"model.layers.30.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
38912 |
"model.layers.30.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
38913 |
-
"model.layers.30.mlp.gate.
|
|
|
|
|
|
|
38914 |
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
38915 |
"model.layers.30.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
38916 |
"model.layers.30.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
@@ -40467,7 +40542,10 @@
|
|
40467 |
"model.layers.31.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
40468 |
"model.layers.31.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
40469 |
"model.layers.31.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
40470 |
-
"model.layers.31.mlp.gate.
|
|
|
|
|
|
|
40471 |
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
40472 |
"model.layers.31.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
40473 |
"model.layers.31.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
@@ -42024,7 +42102,10 @@
|
|
42024 |
"model.layers.32.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
42025 |
"model.layers.32.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
42026 |
"model.layers.32.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
42027 |
-
"model.layers.32.mlp.gate.
|
|
|
|
|
|
|
42028 |
"model.layers.32.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
42029 |
"model.layers.32.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
42030 |
"model.layers.32.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
@@ -43581,7 +43662,10 @@
|
|
43581 |
"model.layers.33.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
43582 |
"model.layers.33.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
43583 |
"model.layers.33.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
43584 |
-
"model.layers.33.mlp.gate.
|
|
|
|
|
|
|
43585 |
"model.layers.33.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
43586 |
"model.layers.33.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
43587 |
"model.layers.33.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
@@ -43698,66 +43782,66 @@
|
|
43698 |
"model.layers.34.mlp.experts.104.up_proj.qweight": "model-00003-of-00005.safetensors",
|
43699 |
"model.layers.34.mlp.experts.104.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
43700 |
"model.layers.34.mlp.experts.104.up_proj.scales": "model-00003-of-00005.safetensors",
|
43701 |
-
"model.layers.34.mlp.experts.105.down_proj.g_idx": "model-
|
43702 |
-
"model.layers.34.mlp.experts.105.down_proj.qweight": "model-
|
43703 |
-
"model.layers.34.mlp.experts.105.down_proj.qzeros": "model-
|
43704 |
-
"model.layers.34.mlp.experts.105.down_proj.scales": "model-
|
43705 |
"model.layers.34.mlp.experts.105.gate_proj.g_idx": "model-00003-of-00005.safetensors",
|
43706 |
"model.layers.34.mlp.experts.105.gate_proj.qweight": "model-00003-of-00005.safetensors",
|
43707 |
"model.layers.34.mlp.experts.105.gate_proj.qzeros": "model-00003-of-00005.safetensors",
|
43708 |
"model.layers.34.mlp.experts.105.gate_proj.scales": "model-00003-of-00005.safetensors",
|
43709 |
-
"model.layers.34.mlp.experts.105.up_proj.g_idx": "model-
|
43710 |
-
"model.layers.34.mlp.experts.105.up_proj.qweight": "model-
|
43711 |
-
"model.layers.34.mlp.experts.105.up_proj.qzeros": "model-
|
43712 |
-
"model.layers.34.mlp.experts.105.up_proj.scales": "model-
|
43713 |
-
"model.layers.34.mlp.experts.106.down_proj.g_idx": "model-
|
43714 |
-
"model.layers.34.mlp.experts.106.down_proj.qweight": "model-
|
43715 |
-
"model.layers.34.mlp.experts.106.down_proj.qzeros": "model-
|
43716 |
-
"model.layers.34.mlp.experts.106.down_proj.scales": "model-
|
43717 |
-
"model.layers.34.mlp.experts.106.gate_proj.g_idx": "model-
|
43718 |
-
"model.layers.34.mlp.experts.106.gate_proj.qweight": "model-
|
43719 |
-
"model.layers.34.mlp.experts.106.gate_proj.qzeros": "model-
|
43720 |
-
"model.layers.34.mlp.experts.106.gate_proj.scales": "model-
|
43721 |
-
"model.layers.34.mlp.experts.106.up_proj.g_idx": "model-
|
43722 |
-
"model.layers.34.mlp.experts.106.up_proj.qweight": "model-
|
43723 |
-
"model.layers.34.mlp.experts.106.up_proj.qzeros": "model-
|
43724 |
-
"model.layers.34.mlp.experts.106.up_proj.scales": "model-
|
43725 |
-
"model.layers.34.mlp.experts.107.down_proj.g_idx": "model-
|
43726 |
-
"model.layers.34.mlp.experts.107.down_proj.qweight": "model-
|
43727 |
-
"model.layers.34.mlp.experts.107.down_proj.qzeros": "model-
|
43728 |
-
"model.layers.34.mlp.experts.107.down_proj.scales": "model-
|
43729 |
-
"model.layers.34.mlp.experts.107.gate_proj.g_idx": "model-
|
43730 |
-
"model.layers.34.mlp.experts.107.gate_proj.qweight": "model-
|
43731 |
-
"model.layers.34.mlp.experts.107.gate_proj.qzeros": "model-
|
43732 |
-
"model.layers.34.mlp.experts.107.gate_proj.scales": "model-
|
43733 |
-
"model.layers.34.mlp.experts.107.up_proj.g_idx": "model-
|
43734 |
-
"model.layers.34.mlp.experts.107.up_proj.qweight": "model-
|
43735 |
-
"model.layers.34.mlp.experts.107.up_proj.qzeros": "model-
|
43736 |
-
"model.layers.34.mlp.experts.107.up_proj.scales": "model-
|
43737 |
-
"model.layers.34.mlp.experts.108.down_proj.g_idx": "model-
|
43738 |
-
"model.layers.34.mlp.experts.108.down_proj.qweight": "model-
|
43739 |
-
"model.layers.34.mlp.experts.108.down_proj.qzeros": "model-
|
43740 |
-
"model.layers.34.mlp.experts.108.down_proj.scales": "model-
|
43741 |
-
"model.layers.34.mlp.experts.108.gate_proj.g_idx": "model-
|
43742 |
-
"model.layers.34.mlp.experts.108.gate_proj.qweight": "model-
|
43743 |
-
"model.layers.34.mlp.experts.108.gate_proj.qzeros": "model-
|
43744 |
-
"model.layers.34.mlp.experts.108.gate_proj.scales": "model-
|
43745 |
-
"model.layers.34.mlp.experts.108.up_proj.g_idx": "model-
|
43746 |
-
"model.layers.34.mlp.experts.108.up_proj.qweight": "model-
|
43747 |
-
"model.layers.34.mlp.experts.108.up_proj.qzeros": "model-
|
43748 |
-
"model.layers.34.mlp.experts.108.up_proj.scales": "model-
|
43749 |
-
"model.layers.34.mlp.experts.109.down_proj.g_idx": "model-
|
43750 |
-
"model.layers.34.mlp.experts.109.down_proj.qweight": "model-
|
43751 |
-
"model.layers.34.mlp.experts.109.down_proj.qzeros": "model-
|
43752 |
-
"model.layers.34.mlp.experts.109.down_proj.scales": "model-
|
43753 |
-
"model.layers.34.mlp.experts.109.gate_proj.g_idx": "model-
|
43754 |
-
"model.layers.34.mlp.experts.109.gate_proj.qweight": "model-
|
43755 |
-
"model.layers.34.mlp.experts.109.gate_proj.qzeros": "model-
|
43756 |
-
"model.layers.34.mlp.experts.109.gate_proj.scales": "model-
|
43757 |
-
"model.layers.34.mlp.experts.109.up_proj.g_idx": "model-
|
43758 |
-
"model.layers.34.mlp.experts.109.up_proj.qweight": "model-
|
43759 |
-
"model.layers.34.mlp.experts.109.up_proj.qzeros": "model-
|
43760 |
-
"model.layers.34.mlp.experts.109.up_proj.scales": "model-
|
43761 |
"model.layers.34.mlp.experts.11.down_proj.g_idx": "model-00003-of-00005.safetensors",
|
43762 |
"model.layers.34.mlp.experts.11.down_proj.qweight": "model-00003-of-00005.safetensors",
|
43763 |
"model.layers.34.mlp.experts.11.down_proj.qzeros": "model-00003-of-00005.safetensors",
|
@@ -43774,10 +43858,10 @@
|
|
43774 |
"model.layers.34.mlp.experts.110.down_proj.qweight": "model-00004-of-00005.safetensors",
|
43775 |
"model.layers.34.mlp.experts.110.down_proj.qzeros": "model-00004-of-00005.safetensors",
|
43776 |
"model.layers.34.mlp.experts.110.down_proj.scales": "model-00004-of-00005.safetensors",
|
43777 |
-
"model.layers.34.mlp.experts.110.gate_proj.g_idx": "model-
|
43778 |
-
"model.layers.34.mlp.experts.110.gate_proj.qweight": "model-
|
43779 |
-
"model.layers.34.mlp.experts.110.gate_proj.qzeros": "model-
|
43780 |
-
"model.layers.34.mlp.experts.110.gate_proj.scales": "model-
|
43781 |
"model.layers.34.mlp.experts.110.up_proj.g_idx": "model-00004-of-00005.safetensors",
|
43782 |
"model.layers.34.mlp.experts.110.up_proj.qweight": "model-00004-of-00005.safetensors",
|
43783 |
"model.layers.34.mlp.experts.110.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
@@ -45138,7 +45222,10 @@
|
|
45138 |
"model.layers.34.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
45139 |
"model.layers.34.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
45140 |
"model.layers.34.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
45141 |
-
"model.layers.34.mlp.gate.
|
|
|
|
|
|
|
45142 |
"model.layers.34.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
45143 |
"model.layers.34.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
45144 |
"model.layers.34.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
@@ -46695,7 +46782,10 @@
|
|
46695 |
"model.layers.35.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
46696 |
"model.layers.35.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
46697 |
"model.layers.35.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
46698 |
-
"model.layers.35.mlp.gate.
|
|
|
|
|
|
|
46699 |
"model.layers.35.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
46700 |
"model.layers.35.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
46701 |
"model.layers.35.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -48252,7 +48342,10 @@
|
|
48252 |
"model.layers.36.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
48253 |
"model.layers.36.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
48254 |
"model.layers.36.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
48255 |
-
"model.layers.36.mlp.gate.
|
|
|
|
|
|
|
48256 |
"model.layers.36.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
48257 |
"model.layers.36.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
48258 |
"model.layers.36.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -49809,7 +49902,10 @@
|
|
49809 |
"model.layers.37.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
49810 |
"model.layers.37.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
49811 |
"model.layers.37.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
49812 |
-
"model.layers.37.mlp.gate.
|
|
|
|
|
|
|
49813 |
"model.layers.37.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
49814 |
"model.layers.37.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
49815 |
"model.layers.37.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -51366,7 +51462,10 @@
|
|
51366 |
"model.layers.38.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
51367 |
"model.layers.38.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
51368 |
"model.layers.38.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
51369 |
-
"model.layers.38.mlp.gate.
|
|
|
|
|
|
|
51370 |
"model.layers.38.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
51371 |
"model.layers.38.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
51372 |
"model.layers.38.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -52923,7 +53022,10 @@
|
|
52923 |
"model.layers.39.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
52924 |
"model.layers.39.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
52925 |
"model.layers.39.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
52926 |
-
"model.layers.39.mlp.gate.
|
|
|
|
|
|
|
52927 |
"model.layers.39.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
52928 |
"model.layers.39.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
52929 |
"model.layers.39.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -54480,7 +54582,10 @@
|
|
54480 |
"model.layers.4.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
54481 |
"model.layers.4.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
54482 |
"model.layers.4.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
54483 |
-
"model.layers.4.mlp.gate.
|
|
|
|
|
|
|
54484 |
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
54485 |
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
54486 |
"model.layers.4.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
@@ -56037,7 +56142,10 @@
|
|
56037 |
"model.layers.40.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
56038 |
"model.layers.40.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
56039 |
"model.layers.40.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
56040 |
-
"model.layers.40.mlp.gate.
|
|
|
|
|
|
|
56041 |
"model.layers.40.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
56042 |
"model.layers.40.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
56043 |
"model.layers.40.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -57594,7 +57702,10 @@
|
|
57594 |
"model.layers.41.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
57595 |
"model.layers.41.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
57596 |
"model.layers.41.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
57597 |
-
"model.layers.41.mlp.gate.
|
|
|
|
|
|
|
57598 |
"model.layers.41.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
57599 |
"model.layers.41.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
57600 |
"model.layers.41.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -59151,7 +59262,10 @@
|
|
59151 |
"model.layers.42.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
59152 |
"model.layers.42.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
59153 |
"model.layers.42.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
59154 |
-
"model.layers.42.mlp.gate.
|
|
|
|
|
|
|
59155 |
"model.layers.42.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
59156 |
"model.layers.42.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
59157 |
"model.layers.42.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -60708,7 +60822,10 @@
|
|
60708 |
"model.layers.43.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
60709 |
"model.layers.43.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
60710 |
"model.layers.43.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
60711 |
-
"model.layers.43.mlp.gate.
|
|
|
|
|
|
|
60712 |
"model.layers.43.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
60713 |
"model.layers.43.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
60714 |
"model.layers.43.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -62265,7 +62382,10 @@
|
|
62265 |
"model.layers.44.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
62266 |
"model.layers.44.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
62267 |
"model.layers.44.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
62268 |
-
"model.layers.44.mlp.gate.
|
|
|
|
|
|
|
62269 |
"model.layers.44.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
62270 |
"model.layers.44.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
62271 |
"model.layers.44.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -63822,7 +63942,10 @@
|
|
63822 |
"model.layers.45.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
63823 |
"model.layers.45.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
63824 |
"model.layers.45.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
63825 |
-
"model.layers.45.mlp.gate.
|
|
|
|
|
|
|
63826 |
"model.layers.45.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
63827 |
"model.layers.45.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
63828 |
"model.layers.45.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -65379,7 +65502,10 @@
|
|
65379 |
"model.layers.46.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
65380 |
"model.layers.46.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
65381 |
"model.layers.46.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
65382 |
-
"model.layers.46.mlp.gate.
|
|
|
|
|
|
|
65383 |
"model.layers.46.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
65384 |
"model.layers.46.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
65385 |
"model.layers.46.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -65424,18 +65550,18 @@
|
|
65424 |
"model.layers.47.mlp.experts.1.up_proj.qweight": "model-00004-of-00005.safetensors",
|
65425 |
"model.layers.47.mlp.experts.1.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
65426 |
"model.layers.47.mlp.experts.1.up_proj.scales": "model-00004-of-00005.safetensors",
|
65427 |
-
"model.layers.47.mlp.experts.10.down_proj.g_idx": "model-
|
65428 |
-
"model.layers.47.mlp.experts.10.down_proj.qweight": "model-
|
65429 |
-
"model.layers.47.mlp.experts.10.down_proj.qzeros": "model-
|
65430 |
-
"model.layers.47.mlp.experts.10.down_proj.scales": "model-
|
65431 |
-
"model.layers.47.mlp.experts.10.gate_proj.g_idx": "model-
|
65432 |
-
"model.layers.47.mlp.experts.10.gate_proj.qweight": "model-
|
65433 |
-
"model.layers.47.mlp.experts.10.gate_proj.qzeros": "model-
|
65434 |
-
"model.layers.47.mlp.experts.10.gate_proj.scales": "model-
|
65435 |
-
"model.layers.47.mlp.experts.10.up_proj.g_idx": "model-
|
65436 |
-
"model.layers.47.mlp.experts.10.up_proj.qweight": "model-
|
65437 |
-
"model.layers.47.mlp.experts.10.up_proj.qzeros": "model-
|
65438 |
-
"model.layers.47.mlp.experts.10.up_proj.scales": "model-
|
65439 |
"model.layers.47.mlp.experts.100.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
65440 |
"model.layers.47.mlp.experts.100.down_proj.qweight": "model-00005-of-00005.safetensors",
|
65441 |
"model.layers.47.mlp.experts.100.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
@@ -65556,18 +65682,18 @@
|
|
65556 |
"model.layers.47.mlp.experts.109.up_proj.qweight": "model-00005-of-00005.safetensors",
|
65557 |
"model.layers.47.mlp.experts.109.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
65558 |
"model.layers.47.mlp.experts.109.up_proj.scales": "model-00005-of-00005.safetensors",
|
65559 |
-
"model.layers.47.mlp.experts.11.down_proj.g_idx": "model-
|
65560 |
-
"model.layers.47.mlp.experts.11.down_proj.qweight": "model-
|
65561 |
-
"model.layers.47.mlp.experts.11.down_proj.qzeros": "model-
|
65562 |
-
"model.layers.47.mlp.experts.11.down_proj.scales": "model-
|
65563 |
-
"model.layers.47.mlp.experts.11.gate_proj.g_idx": "model-
|
65564 |
-
"model.layers.47.mlp.experts.11.gate_proj.qweight": "model-
|
65565 |
-
"model.layers.47.mlp.experts.11.gate_proj.qzeros": "model-
|
65566 |
-
"model.layers.47.mlp.experts.11.gate_proj.scales": "model-
|
65567 |
-
"model.layers.47.mlp.experts.11.up_proj.g_idx": "model-
|
65568 |
-
"model.layers.47.mlp.experts.11.up_proj.qweight": "model-
|
65569 |
-
"model.layers.47.mlp.experts.11.up_proj.qzeros": "model-
|
65570 |
-
"model.layers.47.mlp.experts.11.up_proj.scales": "model-
|
65571 |
"model.layers.47.mlp.experts.110.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
65572 |
"model.layers.47.mlp.experts.110.down_proj.qweight": "model-00005-of-00005.safetensors",
|
65573 |
"model.layers.47.mlp.experts.110.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
@@ -65692,10 +65818,10 @@
|
|
65692 |
"model.layers.47.mlp.experts.12.down_proj.qweight": "model-00005-of-00005.safetensors",
|
65693 |
"model.layers.47.mlp.experts.12.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
65694 |
"model.layers.47.mlp.experts.12.down_proj.scales": "model-00005-of-00005.safetensors",
|
65695 |
-
"model.layers.47.mlp.experts.12.gate_proj.g_idx": "model-
|
65696 |
-
"model.layers.47.mlp.experts.12.gate_proj.qweight": "model-
|
65697 |
-
"model.layers.47.mlp.experts.12.gate_proj.qzeros": "model-
|
65698 |
-
"model.layers.47.mlp.experts.12.gate_proj.scales": "model-
|
65699 |
"model.layers.47.mlp.experts.12.up_proj.g_idx": "model-00005-of-00005.safetensors",
|
65700 |
"model.layers.47.mlp.experts.12.up_proj.qweight": "model-00005-of-00005.safetensors",
|
65701 |
"model.layers.47.mlp.experts.12.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
@@ -66276,18 +66402,18 @@
|
|
66276 |
"model.layers.47.mlp.experts.49.up_proj.qweight": "model-00005-of-00005.safetensors",
|
66277 |
"model.layers.47.mlp.experts.49.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
66278 |
"model.layers.47.mlp.experts.49.up_proj.scales": "model-00005-of-00005.safetensors",
|
66279 |
-
"model.layers.47.mlp.experts.5.down_proj.g_idx": "model-
|
66280 |
-
"model.layers.47.mlp.experts.5.down_proj.qweight": "model-
|
66281 |
-
"model.layers.47.mlp.experts.5.down_proj.qzeros": "model-
|
66282 |
-
"model.layers.47.mlp.experts.5.down_proj.scales": "model-
|
66283 |
"model.layers.47.mlp.experts.5.gate_proj.g_idx": "model-00004-of-00005.safetensors",
|
66284 |
"model.layers.47.mlp.experts.5.gate_proj.qweight": "model-00004-of-00005.safetensors",
|
66285 |
"model.layers.47.mlp.experts.5.gate_proj.qzeros": "model-00004-of-00005.safetensors",
|
66286 |
"model.layers.47.mlp.experts.5.gate_proj.scales": "model-00004-of-00005.safetensors",
|
66287 |
-
"model.layers.47.mlp.experts.5.up_proj.g_idx": "model-
|
66288 |
-
"model.layers.47.mlp.experts.5.up_proj.qweight": "model-
|
66289 |
-
"model.layers.47.mlp.experts.5.up_proj.qzeros": "model-
|
66290 |
-
"model.layers.47.mlp.experts.5.up_proj.scales": "model-
|
66291 |
"model.layers.47.mlp.experts.50.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
66292 |
"model.layers.47.mlp.experts.50.down_proj.qweight": "model-00005-of-00005.safetensors",
|
66293 |
"model.layers.47.mlp.experts.50.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
@@ -66408,18 +66534,18 @@
|
|
66408 |
"model.layers.47.mlp.experts.59.up_proj.qweight": "model-00005-of-00005.safetensors",
|
66409 |
"model.layers.47.mlp.experts.59.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
66410 |
"model.layers.47.mlp.experts.59.up_proj.scales": "model-00005-of-00005.safetensors",
|
66411 |
-
"model.layers.47.mlp.experts.6.down_proj.g_idx": "model-
|
66412 |
-
"model.layers.47.mlp.experts.6.down_proj.qweight": "model-
|
66413 |
-
"model.layers.47.mlp.experts.6.down_proj.qzeros": "model-
|
66414 |
-
"model.layers.47.mlp.experts.6.down_proj.scales": "model-
|
66415 |
-
"model.layers.47.mlp.experts.6.gate_proj.g_idx": "model-
|
66416 |
-
"model.layers.47.mlp.experts.6.gate_proj.qweight": "model-
|
66417 |
-
"model.layers.47.mlp.experts.6.gate_proj.qzeros": "model-
|
66418 |
-
"model.layers.47.mlp.experts.6.gate_proj.scales": "model-
|
66419 |
-
"model.layers.47.mlp.experts.6.up_proj.g_idx": "model-
|
66420 |
-
"model.layers.47.mlp.experts.6.up_proj.qweight": "model-
|
66421 |
-
"model.layers.47.mlp.experts.6.up_proj.qzeros": "model-
|
66422 |
-
"model.layers.47.mlp.experts.6.up_proj.scales": "model-
|
66423 |
"model.layers.47.mlp.experts.60.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
66424 |
"model.layers.47.mlp.experts.60.down_proj.qweight": "model-00005-of-00005.safetensors",
|
66425 |
"model.layers.47.mlp.experts.60.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
@@ -66540,18 +66666,18 @@
|
|
66540 |
"model.layers.47.mlp.experts.69.up_proj.qweight": "model-00005-of-00005.safetensors",
|
66541 |
"model.layers.47.mlp.experts.69.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
66542 |
"model.layers.47.mlp.experts.69.up_proj.scales": "model-00005-of-00005.safetensors",
|
66543 |
-
"model.layers.47.mlp.experts.7.down_proj.g_idx": "model-
|
66544 |
-
"model.layers.47.mlp.experts.7.down_proj.qweight": "model-
|
66545 |
-
"model.layers.47.mlp.experts.7.down_proj.qzeros": "model-
|
66546 |
-
"model.layers.47.mlp.experts.7.down_proj.scales": "model-
|
66547 |
-
"model.layers.47.mlp.experts.7.gate_proj.g_idx": "model-
|
66548 |
-
"model.layers.47.mlp.experts.7.gate_proj.qweight": "model-
|
66549 |
-
"model.layers.47.mlp.experts.7.gate_proj.qzeros": "model-
|
66550 |
-
"model.layers.47.mlp.experts.7.gate_proj.scales": "model-
|
66551 |
-
"model.layers.47.mlp.experts.7.up_proj.g_idx": "model-
|
66552 |
-
"model.layers.47.mlp.experts.7.up_proj.qweight": "model-
|
66553 |
-
"model.layers.47.mlp.experts.7.up_proj.qzeros": "model-
|
66554 |
-
"model.layers.47.mlp.experts.7.up_proj.scales": "model-
|
66555 |
"model.layers.47.mlp.experts.70.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
66556 |
"model.layers.47.mlp.experts.70.down_proj.qweight": "model-00005-of-00005.safetensors",
|
66557 |
"model.layers.47.mlp.experts.70.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
@@ -66672,18 +66798,18 @@
|
|
66672 |
"model.layers.47.mlp.experts.79.up_proj.qweight": "model-00005-of-00005.safetensors",
|
66673 |
"model.layers.47.mlp.experts.79.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
66674 |
"model.layers.47.mlp.experts.79.up_proj.scales": "model-00005-of-00005.safetensors",
|
66675 |
-
"model.layers.47.mlp.experts.8.down_proj.g_idx": "model-
|
66676 |
-
"model.layers.47.mlp.experts.8.down_proj.qweight": "model-
|
66677 |
-
"model.layers.47.mlp.experts.8.down_proj.qzeros": "model-
|
66678 |
-
"model.layers.47.mlp.experts.8.down_proj.scales": "model-
|
66679 |
-
"model.layers.47.mlp.experts.8.gate_proj.g_idx": "model-
|
66680 |
-
"model.layers.47.mlp.experts.8.gate_proj.qweight": "model-
|
66681 |
-
"model.layers.47.mlp.experts.8.gate_proj.qzeros": "model-
|
66682 |
-
"model.layers.47.mlp.experts.8.gate_proj.scales": "model-
|
66683 |
-
"model.layers.47.mlp.experts.8.up_proj.g_idx": "model-
|
66684 |
-
"model.layers.47.mlp.experts.8.up_proj.qweight": "model-
|
66685 |
-
"model.layers.47.mlp.experts.8.up_proj.qzeros": "model-
|
66686 |
-
"model.layers.47.mlp.experts.8.up_proj.scales": "model-
|
66687 |
"model.layers.47.mlp.experts.80.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
66688 |
"model.layers.47.mlp.experts.80.down_proj.qweight": "model-00005-of-00005.safetensors",
|
66689 |
"model.layers.47.mlp.experts.80.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
@@ -66804,18 +66930,18 @@
|
|
66804 |
"model.layers.47.mlp.experts.89.up_proj.qweight": "model-00005-of-00005.safetensors",
|
66805 |
"model.layers.47.mlp.experts.89.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
66806 |
"model.layers.47.mlp.experts.89.up_proj.scales": "model-00005-of-00005.safetensors",
|
66807 |
-
"model.layers.47.mlp.experts.9.down_proj.g_idx": "model-
|
66808 |
-
"model.layers.47.mlp.experts.9.down_proj.qweight": "model-
|
66809 |
-
"model.layers.47.mlp.experts.9.down_proj.qzeros": "model-
|
66810 |
-
"model.layers.47.mlp.experts.9.down_proj.scales": "model-
|
66811 |
-
"model.layers.47.mlp.experts.9.gate_proj.g_idx": "model-
|
66812 |
-
"model.layers.47.mlp.experts.9.gate_proj.qweight": "model-
|
66813 |
-
"model.layers.47.mlp.experts.9.gate_proj.qzeros": "model-
|
66814 |
-
"model.layers.47.mlp.experts.9.gate_proj.scales": "model-
|
66815 |
-
"model.layers.47.mlp.experts.9.up_proj.g_idx": "model-
|
66816 |
-
"model.layers.47.mlp.experts.9.up_proj.qweight": "model-
|
66817 |
-
"model.layers.47.mlp.experts.9.up_proj.qzeros": "model-
|
66818 |
-
"model.layers.47.mlp.experts.9.up_proj.scales": "model-
|
66819 |
"model.layers.47.mlp.experts.90.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
66820 |
"model.layers.47.mlp.experts.90.down_proj.qweight": "model-00005-of-00005.safetensors",
|
66821 |
"model.layers.47.mlp.experts.90.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
@@ -66936,7 +67062,10 @@
|
|
66936 |
"model.layers.47.mlp.experts.99.up_proj.qweight": "model-00005-of-00005.safetensors",
|
66937 |
"model.layers.47.mlp.experts.99.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
66938 |
"model.layers.47.mlp.experts.99.up_proj.scales": "model-00005-of-00005.safetensors",
|
66939 |
-
"model.layers.47.mlp.gate.
|
|
|
|
|
|
|
66940 |
"model.layers.47.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
|
66941 |
"model.layers.47.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
66942 |
"model.layers.47.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
@@ -68493,7 +68622,10 @@
|
|
68493 |
"model.layers.5.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
68494 |
"model.layers.5.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
68495 |
"model.layers.5.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
68496 |
-
"model.layers.5.mlp.gate.
|
|
|
|
|
|
|
68497 |
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
68498 |
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
68499 |
"model.layers.5.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
@@ -70050,7 +70182,10 @@
|
|
70050 |
"model.layers.6.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
70051 |
"model.layers.6.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
70052 |
"model.layers.6.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
70053 |
-
"model.layers.6.mlp.gate.
|
|
|
|
|
|
|
70054 |
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
70055 |
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
70056 |
"model.layers.6.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
@@ -71607,7 +71742,10 @@
|
|
71607 |
"model.layers.7.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
71608 |
"model.layers.7.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
71609 |
"model.layers.7.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
71610 |
-
"model.layers.7.mlp.gate.
|
|
|
|
|
|
|
71611 |
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
71612 |
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
71613 |
"model.layers.7.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
@@ -73164,7 +73302,10 @@
|
|
73164 |
"model.layers.8.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
73165 |
"model.layers.8.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
73166 |
"model.layers.8.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
73167 |
-
"model.layers.8.mlp.gate.
|
|
|
|
|
|
|
73168 |
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
73169 |
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
73170 |
"model.layers.8.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
@@ -74721,7 +74862,10 @@
|
|
74721 |
"model.layers.9.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
74722 |
"model.layers.9.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
74723 |
"model.layers.9.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
74724 |
-
"model.layers.9.mlp.gate.
|
|
|
|
|
|
|
74725 |
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
74726 |
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
74727 |
"model.layers.9.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
|
|
1 |
{
|
2 |
"metadata": {
|
3 |
+
"total_size": 16905940992
|
4 |
},
|
5 |
"weight_map": {
|
6 |
"lm_head.weight": "model-00005-of-00005.safetensors",
|
|
|
1542 |
"model.layers.0.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
1543 |
"model.layers.0.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
1544 |
"model.layers.0.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
1545 |
+
"model.layers.0.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
|
1546 |
+
"model.layers.0.mlp.gate.qweight": "model-00001-of-00005.safetensors",
|
1547 |
+
"model.layers.0.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
|
1548 |
+
"model.layers.0.mlp.gate.scales": "model-00001-of-00005.safetensors",
|
1549 |
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
1550 |
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
1551 |
"model.layers.0.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
|
|
3102 |
"model.layers.1.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
3103 |
"model.layers.1.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
3104 |
"model.layers.1.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
3105 |
+
"model.layers.1.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
|
3106 |
+
"model.layers.1.mlp.gate.qweight": "model-00001-of-00005.safetensors",
|
3107 |
+
"model.layers.1.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
|
3108 |
+
"model.layers.1.mlp.gate.scales": "model-00001-of-00005.safetensors",
|
3109 |
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
3110 |
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
3111 |
"model.layers.1.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
|
|
3882 |
"model.layers.10.mlp.experts.4.up_proj.qweight": "model-00001-of-00005.safetensors",
|
3883 |
"model.layers.10.mlp.experts.4.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
3884 |
"model.layers.10.mlp.experts.4.up_proj.scales": "model-00001-of-00005.safetensors",
|
3885 |
+
"model.layers.10.mlp.experts.40.down_proj.g_idx": "model-00001-of-00005.safetensors",
|
3886 |
+
"model.layers.10.mlp.experts.40.down_proj.qweight": "model-00001-of-00005.safetensors",
|
3887 |
+
"model.layers.10.mlp.experts.40.down_proj.qzeros": "model-00001-of-00005.safetensors",
|
3888 |
+
"model.layers.10.mlp.experts.40.down_proj.scales": "model-00001-of-00005.safetensors",
|
3889 |
"model.layers.10.mlp.experts.40.gate_proj.g_idx": "model-00001-of-00005.safetensors",
|
3890 |
"model.layers.10.mlp.experts.40.gate_proj.qweight": "model-00001-of-00005.safetensors",
|
3891 |
"model.layers.10.mlp.experts.40.gate_proj.qzeros": "model-00001-of-00005.safetensors",
|
|
|
3894 |
"model.layers.10.mlp.experts.40.up_proj.qweight": "model-00001-of-00005.safetensors",
|
3895 |
"model.layers.10.mlp.experts.40.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
3896 |
"model.layers.10.mlp.experts.40.up_proj.scales": "model-00001-of-00005.safetensors",
|
3897 |
+
"model.layers.10.mlp.experts.41.down_proj.g_idx": "model-00001-of-00005.safetensors",
|
3898 |
+
"model.layers.10.mlp.experts.41.down_proj.qweight": "model-00001-of-00005.safetensors",
|
3899 |
+
"model.layers.10.mlp.experts.41.down_proj.qzeros": "model-00001-of-00005.safetensors",
|
3900 |
+
"model.layers.10.mlp.experts.41.down_proj.scales": "model-00001-of-00005.safetensors",
|
3901 |
+
"model.layers.10.mlp.experts.41.gate_proj.g_idx": "model-00001-of-00005.safetensors",
|
3902 |
+
"model.layers.10.mlp.experts.41.gate_proj.qweight": "model-00001-of-00005.safetensors",
|
3903 |
+
"model.layers.10.mlp.experts.41.gate_proj.qzeros": "model-00001-of-00005.safetensors",
|
3904 |
+
"model.layers.10.mlp.experts.41.gate_proj.scales": "model-00001-of-00005.safetensors",
|
3905 |
+
"model.layers.10.mlp.experts.41.up_proj.g_idx": "model-00001-of-00005.safetensors",
|
3906 |
+
"model.layers.10.mlp.experts.41.up_proj.qweight": "model-00001-of-00005.safetensors",
|
3907 |
+
"model.layers.10.mlp.experts.41.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
3908 |
+
"model.layers.10.mlp.experts.41.up_proj.scales": "model-00001-of-00005.safetensors",
|
3909 |
"model.layers.10.mlp.experts.42.down_proj.g_idx": "model-00002-of-00005.safetensors",
|
3910 |
"model.layers.10.mlp.experts.42.down_proj.qweight": "model-00002-of-00005.safetensors",
|
3911 |
"model.layers.10.mlp.experts.42.down_proj.qzeros": "model-00002-of-00005.safetensors",
|
3912 |
"model.layers.10.mlp.experts.42.down_proj.scales": "model-00002-of-00005.safetensors",
|
3913 |
+
"model.layers.10.mlp.experts.42.gate_proj.g_idx": "model-00001-of-00005.safetensors",
|
3914 |
+
"model.layers.10.mlp.experts.42.gate_proj.qweight": "model-00001-of-00005.safetensors",
|
3915 |
+
"model.layers.10.mlp.experts.42.gate_proj.qzeros": "model-00001-of-00005.safetensors",
|
3916 |
+
"model.layers.10.mlp.experts.42.gate_proj.scales": "model-00001-of-00005.safetensors",
|
3917 |
"model.layers.10.mlp.experts.42.up_proj.g_idx": "model-00002-of-00005.safetensors",
|
3918 |
"model.layers.10.mlp.experts.42.up_proj.qweight": "model-00002-of-00005.safetensors",
|
3919 |
"model.layers.10.mlp.experts.42.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
|
|
4662 |
"model.layers.10.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
4663 |
"model.layers.10.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
4664 |
"model.layers.10.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
4665 |
+
"model.layers.10.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
|
4666 |
+
"model.layers.10.mlp.gate.qweight": "model-00001-of-00005.safetensors",
|
4667 |
+
"model.layers.10.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
|
4668 |
+
"model.layers.10.mlp.gate.scales": "model-00001-of-00005.safetensors",
|
4669 |
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
4670 |
"model.layers.10.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
4671 |
"model.layers.10.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
|
|
6222 |
"model.layers.11.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
6223 |
"model.layers.11.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
6224 |
"model.layers.11.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
6225 |
+
"model.layers.11.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
|
6226 |
+
"model.layers.11.mlp.gate.qweight": "model-00002-of-00005.safetensors",
|
6227 |
+
"model.layers.11.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
|
6228 |
+
"model.layers.11.mlp.gate.scales": "model-00002-of-00005.safetensors",
|
6229 |
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
6230 |
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
6231 |
"model.layers.11.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
|
|
7782 |
"model.layers.12.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
7783 |
"model.layers.12.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
7784 |
"model.layers.12.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
7785 |
+
"model.layers.12.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
|
7786 |
+
"model.layers.12.mlp.gate.qweight": "model-00002-of-00005.safetensors",
|
7787 |
+
"model.layers.12.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
|
7788 |
+
"model.layers.12.mlp.gate.scales": "model-00002-of-00005.safetensors",
|
7789 |
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
7790 |
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
7791 |
"model.layers.12.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
|
|
9342 |
"model.layers.13.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
9343 |
"model.layers.13.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
9344 |
"model.layers.13.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
9345 |
+
"model.layers.13.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
|
9346 |
+
"model.layers.13.mlp.gate.qweight": "model-00002-of-00005.safetensors",
|
9347 |
+
"model.layers.13.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
|
9348 |
+
"model.layers.13.mlp.gate.scales": "model-00002-of-00005.safetensors",
|
9349 |
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
9350 |
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
9351 |
"model.layers.13.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
|
|
10902 |
"model.layers.14.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
10903 |
"model.layers.14.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
10904 |
"model.layers.14.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
10905 |
+
"model.layers.14.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
|
10906 |
+
"model.layers.14.mlp.gate.qweight": "model-00002-of-00005.safetensors",
|
10907 |
+
"model.layers.14.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
|
10908 |
+
"model.layers.14.mlp.gate.scales": "model-00002-of-00005.safetensors",
|
10909 |
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
10910 |
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
10911 |
"model.layers.14.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
|
|
12462 |
"model.layers.15.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
12463 |
"model.layers.15.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
12464 |
"model.layers.15.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
12465 |
+
"model.layers.15.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
|
12466 |
+
"model.layers.15.mlp.gate.qweight": "model-00002-of-00005.safetensors",
|
12467 |
+
"model.layers.15.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
|
12468 |
+
"model.layers.15.mlp.gate.scales": "model-00002-of-00005.safetensors",
|
12469 |
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
12470 |
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
12471 |
"model.layers.15.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
|
|
14022 |
"model.layers.16.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
14023 |
"model.layers.16.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
14024 |
"model.layers.16.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
14025 |
+
"model.layers.16.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
|
14026 |
+
"model.layers.16.mlp.gate.qweight": "model-00002-of-00005.safetensors",
|
14027 |
+
"model.layers.16.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
|
14028 |
+
"model.layers.16.mlp.gate.scales": "model-00002-of-00005.safetensors",
|
14029 |
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
14030 |
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
14031 |
"model.layers.16.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
|
|
15582 |
"model.layers.17.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
15583 |
"model.layers.17.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
15584 |
"model.layers.17.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
15585 |
+
"model.layers.17.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
|
15586 |
+
"model.layers.17.mlp.gate.qweight": "model-00002-of-00005.safetensors",
|
15587 |
+
"model.layers.17.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
|
15588 |
+
"model.layers.17.mlp.gate.scales": "model-00002-of-00005.safetensors",
|
15589 |
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
15590 |
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
15591 |
"model.layers.17.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
|
|
17142 |
"model.layers.18.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
17143 |
"model.layers.18.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
17144 |
"model.layers.18.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
17145 |
+
"model.layers.18.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
|
17146 |
+
"model.layers.18.mlp.gate.qweight": "model-00002-of-00005.safetensors",
|
17147 |
+
"model.layers.18.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
|
17148 |
+
"model.layers.18.mlp.gate.scales": "model-00002-of-00005.safetensors",
|
17149 |
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
17150 |
"model.layers.18.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
17151 |
"model.layers.18.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
|
|
18702 |
"model.layers.19.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
18703 |
"model.layers.19.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
18704 |
"model.layers.19.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
18705 |
+
"model.layers.19.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
|
18706 |
+
"model.layers.19.mlp.gate.qweight": "model-00002-of-00005.safetensors",
|
18707 |
+
"model.layers.19.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
|
18708 |
+
"model.layers.19.mlp.gate.scales": "model-00002-of-00005.safetensors",
|
18709 |
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
18710 |
"model.layers.19.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
18711 |
"model.layers.19.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
|
|
20262 |
"model.layers.2.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
20263 |
"model.layers.2.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
20264 |
"model.layers.2.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
20265 |
+
"model.layers.2.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
|
20266 |
+
"model.layers.2.mlp.gate.qweight": "model-00001-of-00005.safetensors",
|
20267 |
+
"model.layers.2.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
|
20268 |
+
"model.layers.2.mlp.gate.scales": "model-00001-of-00005.safetensors",
|
20269 |
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
20270 |
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
20271 |
"model.layers.2.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
|
|
21822 |
"model.layers.20.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
21823 |
"model.layers.20.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
21824 |
"model.layers.20.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
21825 |
+
"model.layers.20.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
|
21826 |
+
"model.layers.20.mlp.gate.qweight": "model-00002-of-00005.safetensors",
|
21827 |
+
"model.layers.20.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
|
21828 |
+
"model.layers.20.mlp.gate.scales": "model-00002-of-00005.safetensors",
|
21829 |
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
21830 |
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
21831 |
"model.layers.20.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
|
|
23382 |
"model.layers.21.mlp.experts.99.up_proj.qweight": "model-00002-of-00005.safetensors",
|
23383 |
"model.layers.21.mlp.experts.99.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
23384 |
"model.layers.21.mlp.experts.99.up_proj.scales": "model-00002-of-00005.safetensors",
|
23385 |
+
"model.layers.21.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
|
23386 |
+
"model.layers.21.mlp.gate.qweight": "model-00002-of-00005.safetensors",
|
23387 |
+
"model.layers.21.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
|
23388 |
+
"model.layers.21.mlp.gate.scales": "model-00002-of-00005.safetensors",
|
23389 |
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
23390 |
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
23391 |
"model.layers.21.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
|
|
24594 |
"model.layers.22.mlp.experts.72.up_proj.qweight": "model-00002-of-00005.safetensors",
|
24595 |
"model.layers.22.mlp.experts.72.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
24596 |
"model.layers.22.mlp.experts.72.up_proj.scales": "model-00002-of-00005.safetensors",
|
24597 |
+
"model.layers.22.mlp.experts.73.down_proj.g_idx": "model-00002-of-00005.safetensors",
|
24598 |
+
"model.layers.22.mlp.experts.73.down_proj.qweight": "model-00002-of-00005.safetensors",
|
24599 |
+
"model.layers.22.mlp.experts.73.down_proj.qzeros": "model-00002-of-00005.safetensors",
|
24600 |
+
"model.layers.22.mlp.experts.73.down_proj.scales": "model-00002-of-00005.safetensors",
|
24601 |
+
"model.layers.22.mlp.experts.73.gate_proj.g_idx": "model-00002-of-00005.safetensors",
|
24602 |
+
"model.layers.22.mlp.experts.73.gate_proj.qweight": "model-00002-of-00005.safetensors",
|
24603 |
+
"model.layers.22.mlp.experts.73.gate_proj.qzeros": "model-00002-of-00005.safetensors",
|
24604 |
+
"model.layers.22.mlp.experts.73.gate_proj.scales": "model-00002-of-00005.safetensors",
|
24605 |
+
"model.layers.22.mlp.experts.73.up_proj.g_idx": "model-00002-of-00005.safetensors",
|
24606 |
+
"model.layers.22.mlp.experts.73.up_proj.qweight": "model-00002-of-00005.safetensors",
|
24607 |
+
"model.layers.22.mlp.experts.73.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
24608 |
+
"model.layers.22.mlp.experts.73.up_proj.scales": "model-00002-of-00005.safetensors",
|
24609 |
+
"model.layers.22.mlp.experts.74.down_proj.g_idx": "model-00002-of-00005.safetensors",
|
24610 |
+
"model.layers.22.mlp.experts.74.down_proj.qweight": "model-00002-of-00005.safetensors",
|
24611 |
+
"model.layers.22.mlp.experts.74.down_proj.qzeros": "model-00002-of-00005.safetensors",
|
24612 |
+
"model.layers.22.mlp.experts.74.down_proj.scales": "model-00002-of-00005.safetensors",
|
24613 |
+
"model.layers.22.mlp.experts.74.gate_proj.g_idx": "model-00002-of-00005.safetensors",
|
24614 |
+
"model.layers.22.mlp.experts.74.gate_proj.qweight": "model-00002-of-00005.safetensors",
|
24615 |
+
"model.layers.22.mlp.experts.74.gate_proj.qzeros": "model-00002-of-00005.safetensors",
|
24616 |
+
"model.layers.22.mlp.experts.74.gate_proj.scales": "model-00002-of-00005.safetensors",
|
24617 |
+
"model.layers.22.mlp.experts.74.up_proj.g_idx": "model-00002-of-00005.safetensors",
|
24618 |
+
"model.layers.22.mlp.experts.74.up_proj.qweight": "model-00002-of-00005.safetensors",
|
24619 |
+
"model.layers.22.mlp.experts.74.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
24620 |
+
"model.layers.22.mlp.experts.74.up_proj.scales": "model-00002-of-00005.safetensors",
|
24621 |
+
"model.layers.22.mlp.experts.75.down_proj.g_idx": "model-00002-of-00005.safetensors",
|
24622 |
+
"model.layers.22.mlp.experts.75.down_proj.qweight": "model-00002-of-00005.safetensors",
|
24623 |
+
"model.layers.22.mlp.experts.75.down_proj.qzeros": "model-00002-of-00005.safetensors",
|
24624 |
+
"model.layers.22.mlp.experts.75.down_proj.scales": "model-00002-of-00005.safetensors",
|
24625 |
+
"model.layers.22.mlp.experts.75.gate_proj.g_idx": "model-00002-of-00005.safetensors",
|
24626 |
+
"model.layers.22.mlp.experts.75.gate_proj.qweight": "model-00002-of-00005.safetensors",
|
24627 |
+
"model.layers.22.mlp.experts.75.gate_proj.qzeros": "model-00002-of-00005.safetensors",
|
24628 |
+
"model.layers.22.mlp.experts.75.gate_proj.scales": "model-00002-of-00005.safetensors",
|
24629 |
+
"model.layers.22.mlp.experts.75.up_proj.g_idx": "model-00002-of-00005.safetensors",
|
24630 |
+
"model.layers.22.mlp.experts.75.up_proj.qweight": "model-00002-of-00005.safetensors",
|
24631 |
+
"model.layers.22.mlp.experts.75.up_proj.qzeros": "model-00002-of-00005.safetensors",
|
24632 |
+
"model.layers.22.mlp.experts.75.up_proj.scales": "model-00002-of-00005.safetensors",
|
24633 |
"model.layers.22.mlp.experts.76.down_proj.g_idx": "model-00003-of-00005.safetensors",
|
24634 |
"model.layers.22.mlp.experts.76.down_proj.qweight": "model-00003-of-00005.safetensors",
|
24635 |
"model.layers.22.mlp.experts.76.down_proj.qzeros": "model-00003-of-00005.safetensors",
|
24636 |
"model.layers.22.mlp.experts.76.down_proj.scales": "model-00003-of-00005.safetensors",
|
24637 |
+
"model.layers.22.mlp.experts.76.gate_proj.g_idx": "model-00002-of-00005.safetensors",
|
24638 |
+
"model.layers.22.mlp.experts.76.gate_proj.qweight": "model-00002-of-00005.safetensors",
|
24639 |
+
"model.layers.22.mlp.experts.76.gate_proj.qzeros": "model-00002-of-00005.safetensors",
|
24640 |
+
"model.layers.22.mlp.experts.76.gate_proj.scales": "model-00002-of-00005.safetensors",
|
24641 |
"model.layers.22.mlp.experts.76.up_proj.g_idx": "model-00003-of-00005.safetensors",
|
24642 |
"model.layers.22.mlp.experts.76.up_proj.qweight": "model-00003-of-00005.safetensors",
|
24643 |
"model.layers.22.mlp.experts.76.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
|
|
24942 |
"model.layers.22.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
24943 |
"model.layers.22.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
24944 |
"model.layers.22.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
24945 |
+
"model.layers.22.mlp.gate.g_idx": "model-00002-of-00005.safetensors",
|
24946 |
+
"model.layers.22.mlp.gate.qweight": "model-00002-of-00005.safetensors",
|
24947 |
+
"model.layers.22.mlp.gate.qzeros": "model-00002-of-00005.safetensors",
|
24948 |
+
"model.layers.22.mlp.gate.scales": "model-00002-of-00005.safetensors",
|
24949 |
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
24950 |
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
24951 |
"model.layers.22.self_attn.k_proj.g_idx": "model-00002-of-00005.safetensors",
|
|
|
26502 |
"model.layers.23.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
26503 |
"model.layers.23.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
26504 |
"model.layers.23.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
26505 |
+
"model.layers.23.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
|
26506 |
+
"model.layers.23.mlp.gate.qweight": "model-00003-of-00005.safetensors",
|
26507 |
+
"model.layers.23.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
|
26508 |
+
"model.layers.23.mlp.gate.scales": "model-00003-of-00005.safetensors",
|
26509 |
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
26510 |
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
26511 |
"model.layers.23.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
|
|
28062 |
"model.layers.24.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
28063 |
"model.layers.24.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
28064 |
"model.layers.24.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
28065 |
+
"model.layers.24.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
|
28066 |
+
"model.layers.24.mlp.gate.qweight": "model-00003-of-00005.safetensors",
|
28067 |
+
"model.layers.24.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
|
28068 |
+
"model.layers.24.mlp.gate.scales": "model-00003-of-00005.safetensors",
|
28069 |
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
28070 |
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
28071 |
"model.layers.24.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
|
|
29622 |
"model.layers.25.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
29623 |
"model.layers.25.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
29624 |
"model.layers.25.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
29625 |
+
"model.layers.25.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
|
29626 |
+
"model.layers.25.mlp.gate.qweight": "model-00003-of-00005.safetensors",
|
29627 |
+
"model.layers.25.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
|
29628 |
+
"model.layers.25.mlp.gate.scales": "model-00003-of-00005.safetensors",
|
29629 |
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
29630 |
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
29631 |
"model.layers.25.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
|
|
31182 |
"model.layers.26.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
31183 |
"model.layers.26.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
31184 |
"model.layers.26.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
31185 |
+
"model.layers.26.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
|
31186 |
+
"model.layers.26.mlp.gate.qweight": "model-00003-of-00005.safetensors",
|
31187 |
+
"model.layers.26.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
|
31188 |
+
"model.layers.26.mlp.gate.scales": "model-00003-of-00005.safetensors",
|
31189 |
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
31190 |
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
31191 |
"model.layers.26.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
|
|
32742 |
"model.layers.27.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
32743 |
"model.layers.27.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
32744 |
"model.layers.27.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
32745 |
+
"model.layers.27.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
|
32746 |
+
"model.layers.27.mlp.gate.qweight": "model-00003-of-00005.safetensors",
|
32747 |
+
"model.layers.27.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
|
32748 |
+
"model.layers.27.mlp.gate.scales": "model-00003-of-00005.safetensors",
|
32749 |
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
32750 |
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
32751 |
"model.layers.27.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
|
|
34302 |
"model.layers.28.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
34303 |
"model.layers.28.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
34304 |
"model.layers.28.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
34305 |
+
"model.layers.28.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
|
34306 |
+
"model.layers.28.mlp.gate.qweight": "model-00003-of-00005.safetensors",
|
34307 |
+
"model.layers.28.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
|
34308 |
+
"model.layers.28.mlp.gate.scales": "model-00003-of-00005.safetensors",
|
34309 |
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
34310 |
"model.layers.28.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
34311 |
"model.layers.28.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
|
|
35862 |
"model.layers.29.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
35863 |
"model.layers.29.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
35864 |
"model.layers.29.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
35865 |
+
"model.layers.29.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
|
35866 |
+
"model.layers.29.mlp.gate.qweight": "model-00003-of-00005.safetensors",
|
35867 |
+
"model.layers.29.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
|
35868 |
+
"model.layers.29.mlp.gate.scales": "model-00003-of-00005.safetensors",
|
35869 |
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
35870 |
"model.layers.29.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
35871 |
"model.layers.29.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
|
|
37422 |
"model.layers.3.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
37423 |
"model.layers.3.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
37424 |
"model.layers.3.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
37425 |
+
"model.layers.3.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
|
37426 |
+
"model.layers.3.mlp.gate.qweight": "model-00001-of-00005.safetensors",
|
37427 |
+
"model.layers.3.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
|
37428 |
+
"model.layers.3.mlp.gate.scales": "model-00001-of-00005.safetensors",
|
37429 |
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
37430 |
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
37431 |
"model.layers.3.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
|
|
38982 |
"model.layers.30.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
38983 |
"model.layers.30.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
38984 |
"model.layers.30.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
38985 |
+
"model.layers.30.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
|
38986 |
+
"model.layers.30.mlp.gate.qweight": "model-00003-of-00005.safetensors",
|
38987 |
+
"model.layers.30.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
|
38988 |
+
"model.layers.30.mlp.gate.scales": "model-00003-of-00005.safetensors",
|
38989 |
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
38990 |
"model.layers.30.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
38991 |
"model.layers.30.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
|
|
40542 |
"model.layers.31.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
40543 |
"model.layers.31.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
40544 |
"model.layers.31.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
40545 |
+
"model.layers.31.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
|
40546 |
+
"model.layers.31.mlp.gate.qweight": "model-00003-of-00005.safetensors",
|
40547 |
+
"model.layers.31.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
|
40548 |
+
"model.layers.31.mlp.gate.scales": "model-00003-of-00005.safetensors",
|
40549 |
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
40550 |
"model.layers.31.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
40551 |
"model.layers.31.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
|
|
42102 |
"model.layers.32.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
42103 |
"model.layers.32.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
42104 |
"model.layers.32.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
42105 |
+
"model.layers.32.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
|
42106 |
+
"model.layers.32.mlp.gate.qweight": "model-00003-of-00005.safetensors",
|
42107 |
+
"model.layers.32.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
|
42108 |
+
"model.layers.32.mlp.gate.scales": "model-00003-of-00005.safetensors",
|
42109 |
"model.layers.32.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
42110 |
"model.layers.32.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
42111 |
"model.layers.32.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
|
|
43662 |
"model.layers.33.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
43663 |
"model.layers.33.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
43664 |
"model.layers.33.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
43665 |
+
"model.layers.33.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
|
43666 |
+
"model.layers.33.mlp.gate.qweight": "model-00003-of-00005.safetensors",
|
43667 |
+
"model.layers.33.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
|
43668 |
+
"model.layers.33.mlp.gate.scales": "model-00003-of-00005.safetensors",
|
43669 |
"model.layers.33.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
43670 |
"model.layers.33.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
43671 |
"model.layers.33.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
|
|
43782 |
"model.layers.34.mlp.experts.104.up_proj.qweight": "model-00003-of-00005.safetensors",
|
43783 |
"model.layers.34.mlp.experts.104.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
43784 |
"model.layers.34.mlp.experts.104.up_proj.scales": "model-00003-of-00005.safetensors",
|
43785 |
+
"model.layers.34.mlp.experts.105.down_proj.g_idx": "model-00003-of-00005.safetensors",
|
43786 |
+
"model.layers.34.mlp.experts.105.down_proj.qweight": "model-00003-of-00005.safetensors",
|
43787 |
+
"model.layers.34.mlp.experts.105.down_proj.qzeros": "model-00003-of-00005.safetensors",
|
43788 |
+
"model.layers.34.mlp.experts.105.down_proj.scales": "model-00003-of-00005.safetensors",
|
43789 |
"model.layers.34.mlp.experts.105.gate_proj.g_idx": "model-00003-of-00005.safetensors",
|
43790 |
"model.layers.34.mlp.experts.105.gate_proj.qweight": "model-00003-of-00005.safetensors",
|
43791 |
"model.layers.34.mlp.experts.105.gate_proj.qzeros": "model-00003-of-00005.safetensors",
|
43792 |
"model.layers.34.mlp.experts.105.gate_proj.scales": "model-00003-of-00005.safetensors",
|
43793 |
+
"model.layers.34.mlp.experts.105.up_proj.g_idx": "model-00003-of-00005.safetensors",
|
43794 |
+
"model.layers.34.mlp.experts.105.up_proj.qweight": "model-00003-of-00005.safetensors",
|
43795 |
+
"model.layers.34.mlp.experts.105.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
43796 |
+
"model.layers.34.mlp.experts.105.up_proj.scales": "model-00003-of-00005.safetensors",
|
43797 |
+
"model.layers.34.mlp.experts.106.down_proj.g_idx": "model-00003-of-00005.safetensors",
|
43798 |
+
"model.layers.34.mlp.experts.106.down_proj.qweight": "model-00003-of-00005.safetensors",
|
43799 |
+
"model.layers.34.mlp.experts.106.down_proj.qzeros": "model-00003-of-00005.safetensors",
|
43800 |
+
"model.layers.34.mlp.experts.106.down_proj.scales": "model-00003-of-00005.safetensors",
|
43801 |
+
"model.layers.34.mlp.experts.106.gate_proj.g_idx": "model-00003-of-00005.safetensors",
|
43802 |
+
"model.layers.34.mlp.experts.106.gate_proj.qweight": "model-00003-of-00005.safetensors",
|
43803 |
+
"model.layers.34.mlp.experts.106.gate_proj.qzeros": "model-00003-of-00005.safetensors",
|
43804 |
+
"model.layers.34.mlp.experts.106.gate_proj.scales": "model-00003-of-00005.safetensors",
|
43805 |
+
"model.layers.34.mlp.experts.106.up_proj.g_idx": "model-00003-of-00005.safetensors",
|
43806 |
+
"model.layers.34.mlp.experts.106.up_proj.qweight": "model-00003-of-00005.safetensors",
|
43807 |
+
"model.layers.34.mlp.experts.106.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
43808 |
+
"model.layers.34.mlp.experts.106.up_proj.scales": "model-00003-of-00005.safetensors",
|
43809 |
+
"model.layers.34.mlp.experts.107.down_proj.g_idx": "model-00003-of-00005.safetensors",
|
43810 |
+
"model.layers.34.mlp.experts.107.down_proj.qweight": "model-00003-of-00005.safetensors",
|
43811 |
+
"model.layers.34.mlp.experts.107.down_proj.qzeros": "model-00003-of-00005.safetensors",
|
43812 |
+
"model.layers.34.mlp.experts.107.down_proj.scales": "model-00003-of-00005.safetensors",
|
43813 |
+
"model.layers.34.mlp.experts.107.gate_proj.g_idx": "model-00003-of-00005.safetensors",
|
43814 |
+
"model.layers.34.mlp.experts.107.gate_proj.qweight": "model-00003-of-00005.safetensors",
|
43815 |
+
"model.layers.34.mlp.experts.107.gate_proj.qzeros": "model-00003-of-00005.safetensors",
|
43816 |
+
"model.layers.34.mlp.experts.107.gate_proj.scales": "model-00003-of-00005.safetensors",
|
43817 |
+
"model.layers.34.mlp.experts.107.up_proj.g_idx": "model-00003-of-00005.safetensors",
|
43818 |
+
"model.layers.34.mlp.experts.107.up_proj.qweight": "model-00003-of-00005.safetensors",
|
43819 |
+
"model.layers.34.mlp.experts.107.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
43820 |
+
"model.layers.34.mlp.experts.107.up_proj.scales": "model-00003-of-00005.safetensors",
|
43821 |
+
"model.layers.34.mlp.experts.108.down_proj.g_idx": "model-00003-of-00005.safetensors",
|
43822 |
+
"model.layers.34.mlp.experts.108.down_proj.qweight": "model-00003-of-00005.safetensors",
|
43823 |
+
"model.layers.34.mlp.experts.108.down_proj.qzeros": "model-00003-of-00005.safetensors",
|
43824 |
+
"model.layers.34.mlp.experts.108.down_proj.scales": "model-00003-of-00005.safetensors",
|
43825 |
+
"model.layers.34.mlp.experts.108.gate_proj.g_idx": "model-00003-of-00005.safetensors",
|
43826 |
+
"model.layers.34.mlp.experts.108.gate_proj.qweight": "model-00003-of-00005.safetensors",
|
43827 |
+
"model.layers.34.mlp.experts.108.gate_proj.qzeros": "model-00003-of-00005.safetensors",
|
43828 |
+
"model.layers.34.mlp.experts.108.gate_proj.scales": "model-00003-of-00005.safetensors",
|
43829 |
+
"model.layers.34.mlp.experts.108.up_proj.g_idx": "model-00003-of-00005.safetensors",
|
43830 |
+
"model.layers.34.mlp.experts.108.up_proj.qweight": "model-00003-of-00005.safetensors",
|
43831 |
+
"model.layers.34.mlp.experts.108.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
43832 |
+
"model.layers.34.mlp.experts.108.up_proj.scales": "model-00003-of-00005.safetensors",
|
43833 |
+
"model.layers.34.mlp.experts.109.down_proj.g_idx": "model-00003-of-00005.safetensors",
|
43834 |
+
"model.layers.34.mlp.experts.109.down_proj.qweight": "model-00003-of-00005.safetensors",
|
43835 |
+
"model.layers.34.mlp.experts.109.down_proj.qzeros": "model-00003-of-00005.safetensors",
|
43836 |
+
"model.layers.34.mlp.experts.109.down_proj.scales": "model-00003-of-00005.safetensors",
|
43837 |
+
"model.layers.34.mlp.experts.109.gate_proj.g_idx": "model-00003-of-00005.safetensors",
|
43838 |
+
"model.layers.34.mlp.experts.109.gate_proj.qweight": "model-00003-of-00005.safetensors",
|
43839 |
+
"model.layers.34.mlp.experts.109.gate_proj.qzeros": "model-00003-of-00005.safetensors",
|
43840 |
+
"model.layers.34.mlp.experts.109.gate_proj.scales": "model-00003-of-00005.safetensors",
|
43841 |
+
"model.layers.34.mlp.experts.109.up_proj.g_idx": "model-00003-of-00005.safetensors",
|
43842 |
+
"model.layers.34.mlp.experts.109.up_proj.qweight": "model-00003-of-00005.safetensors",
|
43843 |
+
"model.layers.34.mlp.experts.109.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
43844 |
+
"model.layers.34.mlp.experts.109.up_proj.scales": "model-00003-of-00005.safetensors",
|
43845 |
"model.layers.34.mlp.experts.11.down_proj.g_idx": "model-00003-of-00005.safetensors",
|
43846 |
"model.layers.34.mlp.experts.11.down_proj.qweight": "model-00003-of-00005.safetensors",
|
43847 |
"model.layers.34.mlp.experts.11.down_proj.qzeros": "model-00003-of-00005.safetensors",
|
|
|
43858 |
"model.layers.34.mlp.experts.110.down_proj.qweight": "model-00004-of-00005.safetensors",
|
43859 |
"model.layers.34.mlp.experts.110.down_proj.qzeros": "model-00004-of-00005.safetensors",
|
43860 |
"model.layers.34.mlp.experts.110.down_proj.scales": "model-00004-of-00005.safetensors",
|
43861 |
+
"model.layers.34.mlp.experts.110.gate_proj.g_idx": "model-00003-of-00005.safetensors",
|
43862 |
+
"model.layers.34.mlp.experts.110.gate_proj.qweight": "model-00003-of-00005.safetensors",
|
43863 |
+
"model.layers.34.mlp.experts.110.gate_proj.qzeros": "model-00003-of-00005.safetensors",
|
43864 |
+
"model.layers.34.mlp.experts.110.gate_proj.scales": "model-00003-of-00005.safetensors",
|
43865 |
"model.layers.34.mlp.experts.110.up_proj.g_idx": "model-00004-of-00005.safetensors",
|
43866 |
"model.layers.34.mlp.experts.110.up_proj.qweight": "model-00004-of-00005.safetensors",
|
43867 |
"model.layers.34.mlp.experts.110.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
|
|
45222 |
"model.layers.34.mlp.experts.99.up_proj.qweight": "model-00003-of-00005.safetensors",
|
45223 |
"model.layers.34.mlp.experts.99.up_proj.qzeros": "model-00003-of-00005.safetensors",
|
45224 |
"model.layers.34.mlp.experts.99.up_proj.scales": "model-00003-of-00005.safetensors",
|
45225 |
+
"model.layers.34.mlp.gate.g_idx": "model-00003-of-00005.safetensors",
|
45226 |
+
"model.layers.34.mlp.gate.qweight": "model-00003-of-00005.safetensors",
|
45227 |
+
"model.layers.34.mlp.gate.qzeros": "model-00003-of-00005.safetensors",
|
45228 |
+
"model.layers.34.mlp.gate.scales": "model-00003-of-00005.safetensors",
|
45229 |
"model.layers.34.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
45230 |
"model.layers.34.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
45231 |
"model.layers.34.self_attn.k_proj.g_idx": "model-00003-of-00005.safetensors",
|
|
|
46782 |
"model.layers.35.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
46783 |
"model.layers.35.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
46784 |
"model.layers.35.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
46785 |
+
"model.layers.35.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
46786 |
+
"model.layers.35.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
46787 |
+
"model.layers.35.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
46788 |
+
"model.layers.35.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
46789 |
"model.layers.35.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
46790 |
"model.layers.35.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
46791 |
"model.layers.35.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
48342 |
"model.layers.36.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
48343 |
"model.layers.36.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
48344 |
"model.layers.36.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
48345 |
+
"model.layers.36.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
48346 |
+
"model.layers.36.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
48347 |
+
"model.layers.36.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
48348 |
+
"model.layers.36.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
48349 |
"model.layers.36.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
48350 |
"model.layers.36.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
48351 |
"model.layers.36.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
49902 |
"model.layers.37.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
49903 |
"model.layers.37.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
49904 |
"model.layers.37.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
49905 |
+
"model.layers.37.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
49906 |
+
"model.layers.37.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
49907 |
+
"model.layers.37.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
49908 |
+
"model.layers.37.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
49909 |
"model.layers.37.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
49910 |
"model.layers.37.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
49911 |
"model.layers.37.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
51462 |
"model.layers.38.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
51463 |
"model.layers.38.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
51464 |
"model.layers.38.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
51465 |
+
"model.layers.38.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
51466 |
+
"model.layers.38.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
51467 |
+
"model.layers.38.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
51468 |
+
"model.layers.38.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
51469 |
"model.layers.38.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
51470 |
"model.layers.38.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
51471 |
"model.layers.38.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
53022 |
"model.layers.39.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
53023 |
"model.layers.39.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
53024 |
"model.layers.39.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
53025 |
+
"model.layers.39.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
53026 |
+
"model.layers.39.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
53027 |
+
"model.layers.39.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
53028 |
+
"model.layers.39.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
53029 |
"model.layers.39.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
53030 |
"model.layers.39.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
53031 |
"model.layers.39.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
54582 |
"model.layers.4.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
54583 |
"model.layers.4.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
54584 |
"model.layers.4.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
54585 |
+
"model.layers.4.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
|
54586 |
+
"model.layers.4.mlp.gate.qweight": "model-00001-of-00005.safetensors",
|
54587 |
+
"model.layers.4.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
|
54588 |
+
"model.layers.4.mlp.gate.scales": "model-00001-of-00005.safetensors",
|
54589 |
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
54590 |
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
54591 |
"model.layers.4.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
|
|
56142 |
"model.layers.40.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
56143 |
"model.layers.40.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
56144 |
"model.layers.40.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
56145 |
+
"model.layers.40.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
56146 |
+
"model.layers.40.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
56147 |
+
"model.layers.40.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
56148 |
+
"model.layers.40.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
56149 |
"model.layers.40.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
56150 |
"model.layers.40.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
56151 |
"model.layers.40.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
57702 |
"model.layers.41.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
57703 |
"model.layers.41.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
57704 |
"model.layers.41.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
57705 |
+
"model.layers.41.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
57706 |
+
"model.layers.41.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
57707 |
+
"model.layers.41.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
57708 |
+
"model.layers.41.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
57709 |
"model.layers.41.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
57710 |
"model.layers.41.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
57711 |
"model.layers.41.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
59262 |
"model.layers.42.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
59263 |
"model.layers.42.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
59264 |
"model.layers.42.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
59265 |
+
"model.layers.42.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
59266 |
+
"model.layers.42.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
59267 |
+
"model.layers.42.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
59268 |
+
"model.layers.42.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
59269 |
"model.layers.42.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
59270 |
"model.layers.42.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
59271 |
"model.layers.42.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
60822 |
"model.layers.43.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
60823 |
"model.layers.43.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
60824 |
"model.layers.43.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
60825 |
+
"model.layers.43.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
60826 |
+
"model.layers.43.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
60827 |
+
"model.layers.43.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
60828 |
+
"model.layers.43.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
60829 |
"model.layers.43.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
60830 |
"model.layers.43.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
60831 |
"model.layers.43.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
62382 |
"model.layers.44.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
62383 |
"model.layers.44.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
62384 |
"model.layers.44.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
62385 |
+
"model.layers.44.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
62386 |
+
"model.layers.44.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
62387 |
+
"model.layers.44.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
62388 |
+
"model.layers.44.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
62389 |
"model.layers.44.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
62390 |
"model.layers.44.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
62391 |
"model.layers.44.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
63942 |
"model.layers.45.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
63943 |
"model.layers.45.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
63944 |
"model.layers.45.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
63945 |
+
"model.layers.45.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
63946 |
+
"model.layers.45.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
63947 |
+
"model.layers.45.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
63948 |
+
"model.layers.45.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
63949 |
"model.layers.45.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
63950 |
"model.layers.45.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
63951 |
"model.layers.45.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
65502 |
"model.layers.46.mlp.experts.99.up_proj.qweight": "model-00004-of-00005.safetensors",
|
65503 |
"model.layers.46.mlp.experts.99.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
65504 |
"model.layers.46.mlp.experts.99.up_proj.scales": "model-00004-of-00005.safetensors",
|
65505 |
+
"model.layers.46.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
65506 |
+
"model.layers.46.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
65507 |
+
"model.layers.46.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
65508 |
+
"model.layers.46.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
65509 |
"model.layers.46.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
65510 |
"model.layers.46.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
65511 |
"model.layers.46.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
65550 |
"model.layers.47.mlp.experts.1.up_proj.qweight": "model-00004-of-00005.safetensors",
|
65551 |
"model.layers.47.mlp.experts.1.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
65552 |
"model.layers.47.mlp.experts.1.up_proj.scales": "model-00004-of-00005.safetensors",
|
65553 |
+
"model.layers.47.mlp.experts.10.down_proj.g_idx": "model-00004-of-00005.safetensors",
|
65554 |
+
"model.layers.47.mlp.experts.10.down_proj.qweight": "model-00004-of-00005.safetensors",
|
65555 |
+
"model.layers.47.mlp.experts.10.down_proj.qzeros": "model-00004-of-00005.safetensors",
|
65556 |
+
"model.layers.47.mlp.experts.10.down_proj.scales": "model-00004-of-00005.safetensors",
|
65557 |
+
"model.layers.47.mlp.experts.10.gate_proj.g_idx": "model-00004-of-00005.safetensors",
|
65558 |
+
"model.layers.47.mlp.experts.10.gate_proj.qweight": "model-00004-of-00005.safetensors",
|
65559 |
+
"model.layers.47.mlp.experts.10.gate_proj.qzeros": "model-00004-of-00005.safetensors",
|
65560 |
+
"model.layers.47.mlp.experts.10.gate_proj.scales": "model-00004-of-00005.safetensors",
|
65561 |
+
"model.layers.47.mlp.experts.10.up_proj.g_idx": "model-00004-of-00005.safetensors",
|
65562 |
+
"model.layers.47.mlp.experts.10.up_proj.qweight": "model-00004-of-00005.safetensors",
|
65563 |
+
"model.layers.47.mlp.experts.10.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
65564 |
+
"model.layers.47.mlp.experts.10.up_proj.scales": "model-00004-of-00005.safetensors",
|
65565 |
"model.layers.47.mlp.experts.100.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
65566 |
"model.layers.47.mlp.experts.100.down_proj.qweight": "model-00005-of-00005.safetensors",
|
65567 |
"model.layers.47.mlp.experts.100.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
|
|
65682 |
"model.layers.47.mlp.experts.109.up_proj.qweight": "model-00005-of-00005.safetensors",
|
65683 |
"model.layers.47.mlp.experts.109.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
65684 |
"model.layers.47.mlp.experts.109.up_proj.scales": "model-00005-of-00005.safetensors",
|
65685 |
+
"model.layers.47.mlp.experts.11.down_proj.g_idx": "model-00004-of-00005.safetensors",
|
65686 |
+
"model.layers.47.mlp.experts.11.down_proj.qweight": "model-00004-of-00005.safetensors",
|
65687 |
+
"model.layers.47.mlp.experts.11.down_proj.qzeros": "model-00004-of-00005.safetensors",
|
65688 |
+
"model.layers.47.mlp.experts.11.down_proj.scales": "model-00004-of-00005.safetensors",
|
65689 |
+
"model.layers.47.mlp.experts.11.gate_proj.g_idx": "model-00004-of-00005.safetensors",
|
65690 |
+
"model.layers.47.mlp.experts.11.gate_proj.qweight": "model-00004-of-00005.safetensors",
|
65691 |
+
"model.layers.47.mlp.experts.11.gate_proj.qzeros": "model-00004-of-00005.safetensors",
|
65692 |
+
"model.layers.47.mlp.experts.11.gate_proj.scales": "model-00004-of-00005.safetensors",
|
65693 |
+
"model.layers.47.mlp.experts.11.up_proj.g_idx": "model-00004-of-00005.safetensors",
|
65694 |
+
"model.layers.47.mlp.experts.11.up_proj.qweight": "model-00004-of-00005.safetensors",
|
65695 |
+
"model.layers.47.mlp.experts.11.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
65696 |
+
"model.layers.47.mlp.experts.11.up_proj.scales": "model-00004-of-00005.safetensors",
|
65697 |
"model.layers.47.mlp.experts.110.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
65698 |
"model.layers.47.mlp.experts.110.down_proj.qweight": "model-00005-of-00005.safetensors",
|
65699 |
"model.layers.47.mlp.experts.110.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
|
|
65818 |
"model.layers.47.mlp.experts.12.down_proj.qweight": "model-00005-of-00005.safetensors",
|
65819 |
"model.layers.47.mlp.experts.12.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
65820 |
"model.layers.47.mlp.experts.12.down_proj.scales": "model-00005-of-00005.safetensors",
|
65821 |
+
"model.layers.47.mlp.experts.12.gate_proj.g_idx": "model-00004-of-00005.safetensors",
|
65822 |
+
"model.layers.47.mlp.experts.12.gate_proj.qweight": "model-00004-of-00005.safetensors",
|
65823 |
+
"model.layers.47.mlp.experts.12.gate_proj.qzeros": "model-00004-of-00005.safetensors",
|
65824 |
+
"model.layers.47.mlp.experts.12.gate_proj.scales": "model-00004-of-00005.safetensors",
|
65825 |
"model.layers.47.mlp.experts.12.up_proj.g_idx": "model-00005-of-00005.safetensors",
|
65826 |
"model.layers.47.mlp.experts.12.up_proj.qweight": "model-00005-of-00005.safetensors",
|
65827 |
"model.layers.47.mlp.experts.12.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
|
|
66402 |
"model.layers.47.mlp.experts.49.up_proj.qweight": "model-00005-of-00005.safetensors",
|
66403 |
"model.layers.47.mlp.experts.49.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
66404 |
"model.layers.47.mlp.experts.49.up_proj.scales": "model-00005-of-00005.safetensors",
|
66405 |
+
"model.layers.47.mlp.experts.5.down_proj.g_idx": "model-00004-of-00005.safetensors",
|
66406 |
+
"model.layers.47.mlp.experts.5.down_proj.qweight": "model-00004-of-00005.safetensors",
|
66407 |
+
"model.layers.47.mlp.experts.5.down_proj.qzeros": "model-00004-of-00005.safetensors",
|
66408 |
+
"model.layers.47.mlp.experts.5.down_proj.scales": "model-00004-of-00005.safetensors",
|
66409 |
"model.layers.47.mlp.experts.5.gate_proj.g_idx": "model-00004-of-00005.safetensors",
|
66410 |
"model.layers.47.mlp.experts.5.gate_proj.qweight": "model-00004-of-00005.safetensors",
|
66411 |
"model.layers.47.mlp.experts.5.gate_proj.qzeros": "model-00004-of-00005.safetensors",
|
66412 |
"model.layers.47.mlp.experts.5.gate_proj.scales": "model-00004-of-00005.safetensors",
|
66413 |
+
"model.layers.47.mlp.experts.5.up_proj.g_idx": "model-00004-of-00005.safetensors",
|
66414 |
+
"model.layers.47.mlp.experts.5.up_proj.qweight": "model-00004-of-00005.safetensors",
|
66415 |
+
"model.layers.47.mlp.experts.5.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
66416 |
+
"model.layers.47.mlp.experts.5.up_proj.scales": "model-00004-of-00005.safetensors",
|
66417 |
"model.layers.47.mlp.experts.50.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
66418 |
"model.layers.47.mlp.experts.50.down_proj.qweight": "model-00005-of-00005.safetensors",
|
66419 |
"model.layers.47.mlp.experts.50.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
|
|
66534 |
"model.layers.47.mlp.experts.59.up_proj.qweight": "model-00005-of-00005.safetensors",
|
66535 |
"model.layers.47.mlp.experts.59.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
66536 |
"model.layers.47.mlp.experts.59.up_proj.scales": "model-00005-of-00005.safetensors",
|
66537 |
+
"model.layers.47.mlp.experts.6.down_proj.g_idx": "model-00004-of-00005.safetensors",
|
66538 |
+
"model.layers.47.mlp.experts.6.down_proj.qweight": "model-00004-of-00005.safetensors",
|
66539 |
+
"model.layers.47.mlp.experts.6.down_proj.qzeros": "model-00004-of-00005.safetensors",
|
66540 |
+
"model.layers.47.mlp.experts.6.down_proj.scales": "model-00004-of-00005.safetensors",
|
66541 |
+
"model.layers.47.mlp.experts.6.gate_proj.g_idx": "model-00004-of-00005.safetensors",
|
66542 |
+
"model.layers.47.mlp.experts.6.gate_proj.qweight": "model-00004-of-00005.safetensors",
|
66543 |
+
"model.layers.47.mlp.experts.6.gate_proj.qzeros": "model-00004-of-00005.safetensors",
|
66544 |
+
"model.layers.47.mlp.experts.6.gate_proj.scales": "model-00004-of-00005.safetensors",
|
66545 |
+
"model.layers.47.mlp.experts.6.up_proj.g_idx": "model-00004-of-00005.safetensors",
|
66546 |
+
"model.layers.47.mlp.experts.6.up_proj.qweight": "model-00004-of-00005.safetensors",
|
66547 |
+
"model.layers.47.mlp.experts.6.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
66548 |
+
"model.layers.47.mlp.experts.6.up_proj.scales": "model-00004-of-00005.safetensors",
|
66549 |
"model.layers.47.mlp.experts.60.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
66550 |
"model.layers.47.mlp.experts.60.down_proj.qweight": "model-00005-of-00005.safetensors",
|
66551 |
"model.layers.47.mlp.experts.60.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
|
|
66666 |
"model.layers.47.mlp.experts.69.up_proj.qweight": "model-00005-of-00005.safetensors",
|
66667 |
"model.layers.47.mlp.experts.69.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
66668 |
"model.layers.47.mlp.experts.69.up_proj.scales": "model-00005-of-00005.safetensors",
|
66669 |
+
"model.layers.47.mlp.experts.7.down_proj.g_idx": "model-00004-of-00005.safetensors",
|
66670 |
+
"model.layers.47.mlp.experts.7.down_proj.qweight": "model-00004-of-00005.safetensors",
|
66671 |
+
"model.layers.47.mlp.experts.7.down_proj.qzeros": "model-00004-of-00005.safetensors",
|
66672 |
+
"model.layers.47.mlp.experts.7.down_proj.scales": "model-00004-of-00005.safetensors",
|
66673 |
+
"model.layers.47.mlp.experts.7.gate_proj.g_idx": "model-00004-of-00005.safetensors",
|
66674 |
+
"model.layers.47.mlp.experts.7.gate_proj.qweight": "model-00004-of-00005.safetensors",
|
66675 |
+
"model.layers.47.mlp.experts.7.gate_proj.qzeros": "model-00004-of-00005.safetensors",
|
66676 |
+
"model.layers.47.mlp.experts.7.gate_proj.scales": "model-00004-of-00005.safetensors",
|
66677 |
+
"model.layers.47.mlp.experts.7.up_proj.g_idx": "model-00004-of-00005.safetensors",
|
66678 |
+
"model.layers.47.mlp.experts.7.up_proj.qweight": "model-00004-of-00005.safetensors",
|
66679 |
+
"model.layers.47.mlp.experts.7.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
66680 |
+
"model.layers.47.mlp.experts.7.up_proj.scales": "model-00004-of-00005.safetensors",
|
66681 |
"model.layers.47.mlp.experts.70.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
66682 |
"model.layers.47.mlp.experts.70.down_proj.qweight": "model-00005-of-00005.safetensors",
|
66683 |
"model.layers.47.mlp.experts.70.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
|
|
66798 |
"model.layers.47.mlp.experts.79.up_proj.qweight": "model-00005-of-00005.safetensors",
|
66799 |
"model.layers.47.mlp.experts.79.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
66800 |
"model.layers.47.mlp.experts.79.up_proj.scales": "model-00005-of-00005.safetensors",
|
66801 |
+
"model.layers.47.mlp.experts.8.down_proj.g_idx": "model-00004-of-00005.safetensors",
|
66802 |
+
"model.layers.47.mlp.experts.8.down_proj.qweight": "model-00004-of-00005.safetensors",
|
66803 |
+
"model.layers.47.mlp.experts.8.down_proj.qzeros": "model-00004-of-00005.safetensors",
|
66804 |
+
"model.layers.47.mlp.experts.8.down_proj.scales": "model-00004-of-00005.safetensors",
|
66805 |
+
"model.layers.47.mlp.experts.8.gate_proj.g_idx": "model-00004-of-00005.safetensors",
|
66806 |
+
"model.layers.47.mlp.experts.8.gate_proj.qweight": "model-00004-of-00005.safetensors",
|
66807 |
+
"model.layers.47.mlp.experts.8.gate_proj.qzeros": "model-00004-of-00005.safetensors",
|
66808 |
+
"model.layers.47.mlp.experts.8.gate_proj.scales": "model-00004-of-00005.safetensors",
|
66809 |
+
"model.layers.47.mlp.experts.8.up_proj.g_idx": "model-00004-of-00005.safetensors",
|
66810 |
+
"model.layers.47.mlp.experts.8.up_proj.qweight": "model-00004-of-00005.safetensors",
|
66811 |
+
"model.layers.47.mlp.experts.8.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
66812 |
+
"model.layers.47.mlp.experts.8.up_proj.scales": "model-00004-of-00005.safetensors",
|
66813 |
"model.layers.47.mlp.experts.80.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
66814 |
"model.layers.47.mlp.experts.80.down_proj.qweight": "model-00005-of-00005.safetensors",
|
66815 |
"model.layers.47.mlp.experts.80.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
|
|
66930 |
"model.layers.47.mlp.experts.89.up_proj.qweight": "model-00005-of-00005.safetensors",
|
66931 |
"model.layers.47.mlp.experts.89.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
66932 |
"model.layers.47.mlp.experts.89.up_proj.scales": "model-00005-of-00005.safetensors",
|
66933 |
+
"model.layers.47.mlp.experts.9.down_proj.g_idx": "model-00004-of-00005.safetensors",
|
66934 |
+
"model.layers.47.mlp.experts.9.down_proj.qweight": "model-00004-of-00005.safetensors",
|
66935 |
+
"model.layers.47.mlp.experts.9.down_proj.qzeros": "model-00004-of-00005.safetensors",
|
66936 |
+
"model.layers.47.mlp.experts.9.down_proj.scales": "model-00004-of-00005.safetensors",
|
66937 |
+
"model.layers.47.mlp.experts.9.gate_proj.g_idx": "model-00004-of-00005.safetensors",
|
66938 |
+
"model.layers.47.mlp.experts.9.gate_proj.qweight": "model-00004-of-00005.safetensors",
|
66939 |
+
"model.layers.47.mlp.experts.9.gate_proj.qzeros": "model-00004-of-00005.safetensors",
|
66940 |
+
"model.layers.47.mlp.experts.9.gate_proj.scales": "model-00004-of-00005.safetensors",
|
66941 |
+
"model.layers.47.mlp.experts.9.up_proj.g_idx": "model-00004-of-00005.safetensors",
|
66942 |
+
"model.layers.47.mlp.experts.9.up_proj.qweight": "model-00004-of-00005.safetensors",
|
66943 |
+
"model.layers.47.mlp.experts.9.up_proj.qzeros": "model-00004-of-00005.safetensors",
|
66944 |
+
"model.layers.47.mlp.experts.9.up_proj.scales": "model-00004-of-00005.safetensors",
|
66945 |
"model.layers.47.mlp.experts.90.down_proj.g_idx": "model-00005-of-00005.safetensors",
|
66946 |
"model.layers.47.mlp.experts.90.down_proj.qweight": "model-00005-of-00005.safetensors",
|
66947 |
"model.layers.47.mlp.experts.90.down_proj.qzeros": "model-00005-of-00005.safetensors",
|
|
|
67062 |
"model.layers.47.mlp.experts.99.up_proj.qweight": "model-00005-of-00005.safetensors",
|
67063 |
"model.layers.47.mlp.experts.99.up_proj.qzeros": "model-00005-of-00005.safetensors",
|
67064 |
"model.layers.47.mlp.experts.99.up_proj.scales": "model-00005-of-00005.safetensors",
|
67065 |
+
"model.layers.47.mlp.gate.g_idx": "model-00004-of-00005.safetensors",
|
67066 |
+
"model.layers.47.mlp.gate.qweight": "model-00004-of-00005.safetensors",
|
67067 |
+
"model.layers.47.mlp.gate.qzeros": "model-00004-of-00005.safetensors",
|
67068 |
+
"model.layers.47.mlp.gate.scales": "model-00004-of-00005.safetensors",
|
67069 |
"model.layers.47.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
|
67070 |
"model.layers.47.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
67071 |
"model.layers.47.self_attn.k_proj.g_idx": "model-00004-of-00005.safetensors",
|
|
|
68622 |
"model.layers.5.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
68623 |
"model.layers.5.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
68624 |
"model.layers.5.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
68625 |
+
"model.layers.5.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
|
68626 |
+
"model.layers.5.mlp.gate.qweight": "model-00001-of-00005.safetensors",
|
68627 |
+
"model.layers.5.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
|
68628 |
+
"model.layers.5.mlp.gate.scales": "model-00001-of-00005.safetensors",
|
68629 |
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
68630 |
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
68631 |
"model.layers.5.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
|
|
70182 |
"model.layers.6.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
70183 |
"model.layers.6.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
70184 |
"model.layers.6.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
70185 |
+
"model.layers.6.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
|
70186 |
+
"model.layers.6.mlp.gate.qweight": "model-00001-of-00005.safetensors",
|
70187 |
+
"model.layers.6.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
|
70188 |
+
"model.layers.6.mlp.gate.scales": "model-00001-of-00005.safetensors",
|
70189 |
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
70190 |
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
70191 |
"model.layers.6.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
|
|
71742 |
"model.layers.7.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
71743 |
"model.layers.7.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
71744 |
"model.layers.7.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
71745 |
+
"model.layers.7.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
|
71746 |
+
"model.layers.7.mlp.gate.qweight": "model-00001-of-00005.safetensors",
|
71747 |
+
"model.layers.7.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
|
71748 |
+
"model.layers.7.mlp.gate.scales": "model-00001-of-00005.safetensors",
|
71749 |
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
71750 |
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
71751 |
"model.layers.7.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
|
|
73302 |
"model.layers.8.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
73303 |
"model.layers.8.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
73304 |
"model.layers.8.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
73305 |
+
"model.layers.8.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
|
73306 |
+
"model.layers.8.mlp.gate.qweight": "model-00001-of-00005.safetensors",
|
73307 |
+
"model.layers.8.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
|
73308 |
+
"model.layers.8.mlp.gate.scales": "model-00001-of-00005.safetensors",
|
73309 |
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
73310 |
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
73311 |
"model.layers.8.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
|
|
74862 |
"model.layers.9.mlp.experts.99.up_proj.qweight": "model-00001-of-00005.safetensors",
|
74863 |
"model.layers.9.mlp.experts.99.up_proj.qzeros": "model-00001-of-00005.safetensors",
|
74864 |
"model.layers.9.mlp.experts.99.up_proj.scales": "model-00001-of-00005.safetensors",
|
74865 |
+
"model.layers.9.mlp.gate.g_idx": "model-00001-of-00005.safetensors",
|
74866 |
+
"model.layers.9.mlp.gate.qweight": "model-00001-of-00005.safetensors",
|
74867 |
+
"model.layers.9.mlp.gate.qzeros": "model-00001-of-00005.safetensors",
|
74868 |
+
"model.layers.9.mlp.gate.scales": "model-00001-of-00005.safetensors",
|
74869 |
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
74870 |
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
74871 |
"model.layers.9.self_attn.k_proj.g_idx": "model-00001-of-00005.safetensors",
|
quantize_config.json
CHANGED
@@ -9,13 +9,15 @@
|
|
9 |
"pack_dtype": "int32",
|
10 |
"meta": {
|
11 |
"quantizer": [
|
12 |
-
"gptqmodel:
|
13 |
],
|
14 |
"uri": "https://github.com/modelcloud/gptqmodel",
|
15 |
-
"damp_percent": 0.
|
16 |
-
"damp_auto_increment": 0.
|
17 |
"static_groups": false,
|
18 |
"true_sequential": true,
|
19 |
-
"mse": 0.0
|
|
|
|
|
20 |
}
|
21 |
}
|
|
|
9 |
"pack_dtype": "int32",
|
10 |
"meta": {
|
11 |
"quantizer": [
|
12 |
+
"gptqmodel:4.0.0-dev"
|
13 |
],
|
14 |
"uri": "https://github.com/modelcloud/gptqmodel",
|
15 |
+
"damp_percent": 0.05,
|
16 |
+
"damp_auto_increment": 0.01,
|
17 |
"static_groups": false,
|
18 |
"true_sequential": true,
|
19 |
+
"mse": 0.0,
|
20 |
+
"v2": false,
|
21 |
+
"v2_alpha": 0.25
|
22 |
}
|
23 |
}
|