/home/floriadmin/miniforge3/envs/mlc/bin/python -m mlc_llm gen_config ../dist/models/Qwen1.5-4B --quantization q4f32_1 --conv-template chatml --output /tmp/tmp78htwu3y [2024-03-18 19:18:16] INFO auto_config.py:115: Found model configuration: ../dist/models/Qwen1.5-4B/config.json [2024-03-18 19:18:16] INFO auto_config.py:153: Found model type: qwen2. Use `--model-type` to override. [2024-03-18 19:18:16] INFO qwen2_model.py:46: context_window_size not found in config.json. Falling back to max_position_embeddings (32768) [2024-03-18 19:18:16] INFO qwen2_model.py:60: prefill_chunk_size defaults to context_window_size (32768) [2024-03-18 19:18:16] WARNING config.py:99: Warning: Cannot override max_batch_size, because QWen2Config does not have this field [2024-03-18 19:18:16] INFO gen_config.py:133: [generation_config.json] Setting bos_token_id: 151643 [2024-03-18 19:18:16] INFO gen_config.py:133: [generation_config.json] Setting eos_token_id: 151643 [2024-03-18 19:18:16] INFO gen_config.py:147: Not found tokenizer config: ../dist/models/Qwen1.5-4B/tokenizer.model [2024-03-18 19:18:16] INFO gen_config.py:145: Found tokenizer config: ../dist/models/Qwen1.5-4B/tokenizer.json. Copying to /tmp/tmp78htwu3y/tokenizer.json [2024-03-18 19:18:16] INFO gen_config.py:145: Found tokenizer config: ../dist/models/Qwen1.5-4B/vocab.json. Copying to /tmp/tmp78htwu3y/vocab.json [2024-03-18 19:18:16] INFO gen_config.py:145: Found tokenizer config: ../dist/models/Qwen1.5-4B/merges.txt. Copying to /tmp/tmp78htwu3y/merges.txt [2024-03-18 19:18:16] INFO gen_config.py:147: Not found tokenizer config: ../dist/models/Qwen1.5-4B/added_tokens.json [2024-03-18 19:18:16] INFO gen_config.py:145: Found tokenizer config: ../dist/models/Qwen1.5-4B/tokenizer_config.json. Copying to /tmp/tmp78htwu3y/tokenizer_config.json [2024-03-18 19:18:16] INFO gen_config.py:75: [System default] Setting pad_token_id: 0 [2024-03-18 19:18:16] INFO gen_config.py:75: [System default] Setting temperature: 0.7 [2024-03-18 19:18:16] INFO gen_config.py:75: [System default] Setting presence_penalty: 0.0 [2024-03-18 19:18:16] INFO gen_config.py:75: [System default] Setting frequency_penalty: 0.0 [2024-03-18 19:18:16] INFO gen_config.py:75: [System default] Setting repetition_penalty: 1.0 [2024-03-18 19:18:16] INFO gen_config.py:75: [System default] Setting top_p: 0.95 [2024-03-18 19:18:16] INFO gen_config.py:75: [System default] Setting mean_gen_len: 128 [2024-03-18 19:18:16] INFO gen_config.py:75: [System default] Setting max_gen_len: 512 [2024-03-18 19:18:16] INFO gen_config.py:75: [System default] Setting shift_fill_factor: 0.3 [2024-03-18 19:18:16] INFO gen_config.py:198: Dumping configuration file to: /tmp/tmp78htwu3y/mlc-chat-config.json /home/floriadmin/miniforge3/envs/mlc/bin/python -m mlc_llm convert_weight ../dist/models/Qwen1.5-4B --quantization q4f32_1 --source-format auto --output /tmp/tmp78htwu3y [2024-03-18 19:18:17] INFO auto_config.py:115: Found model configuration: ../dist/models/Qwen1.5-4B/config.json [2024-03-18 19:18:18] INFO auto_device.py:76: Found device: cuda:0 [2024-03-18 19:18:18] INFO auto_device.py:76: Found device: cuda:1 [2024-03-18 19:18:18] INFO auto_device.py:76: Found device: cuda:2 [2024-03-18 19:18:18] INFO auto_device.py:76: Found device: cuda:3 [2024-03-18 19:18:18] INFO auto_device.py:76: Found device: cuda:4 [2024-03-18 19:18:18] INFO auto_device.py:76: Found device: cuda:5 [2024-03-18 19:18:18] INFO auto_device.py:76: Found device: cuda:6 [2024-03-18 19:18:18] INFO auto_device.py:76: Found device: cuda:7 [2024-03-18 19:18:18] INFO auto_device.py:76: Found device: cuda:8 [2024-03-18 19:18:18] INFO auto_device.py:76: Found device: cuda:9 [2024-03-18 19:18:19] INFO auto_device.py:85: Not found device: rocm:0 [2024-03-18 19:18:20] INFO auto_device.py:85: Not found device: metal:0 [2024-03-18 19:18:22] INFO auto_device.py:76: Found device: vulkan:0 [2024-03-18 19:18:22] INFO auto_device.py:76: Found device: vulkan:1 [2024-03-18 19:18:22] INFO auto_device.py:76: Found device: vulkan:2 [2024-03-18 19:18:22] INFO auto_device.py:76: Found device: vulkan:3 [2024-03-18 19:18:22] INFO auto_device.py:76: Found device: vulkan:4 [2024-03-18 19:18:22] INFO auto_device.py:76: Found device: vulkan:5 [2024-03-18 19:18:22] INFO auto_device.py:76: Found device: vulkan:6 [2024-03-18 19:18:22] INFO auto_device.py:76: Found device: vulkan:7 [2024-03-18 19:18:22] INFO auto_device.py:76: Found device: vulkan:8 [2024-03-18 19:18:22] INFO auto_device.py:76: Found device: vulkan:9 [2024-03-18 19:18:22] INFO auto_device.py:76: Found device: vulkan:10 [2024-03-18 19:18:23] INFO auto_device.py:85: Not found device: opencl:0 [2024-03-18 19:18:23] INFO auto_device.py:33: Using device: cuda:0 [2024-03-18 19:18:23] INFO auto_weight.py:70: Finding weights in: ../dist/models/Qwen1.5-4B [2024-03-18 19:18:23] INFO auto_weight.py:136: Not found Huggingface PyTorch [2024-03-18 19:18:23] INFO auto_weight.py:143: Found source weight format: huggingface-safetensor. Source configuration: ../dist/models/Qwen1.5-4B/model.safetensors.index.json [2024-03-18 19:18:23] INFO auto_weight.py:106: Using source weight configuration: ../dist/models/Qwen1.5-4B/model.safetensors.index.json. Use `--source` to override. [2024-03-18 19:18:23] INFO auto_weight.py:110: Using source weight format: huggingface-safetensor. Use `--source-format` to override. [2024-03-18 19:18:23] INFO auto_config.py:153: Found model type: qwen2. Use `--model-type` to override. [2024-03-18 19:18:23] INFO qwen2_model.py:46: context_window_size not found in config.json. Falling back to max_position_embeddings (32768) [2024-03-18 19:18:23] INFO qwen2_model.py:60: prefill_chunk_size defaults to context_window_size (32768) Weight conversion with arguments: --config ../dist/models/Qwen1.5-4B/config.json --quantization GroupQuantize(name='q4f32_1', kind='group-quant', group_size=40, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float32', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=5, max_int_value=7) --model-type qwen2 --device cuda:0 --source ../dist/models/Qwen1.5-4B/model.safetensors.index.json --source-format huggingface-safetensor --output /tmp/tmp78htwu3y Start storing to cache /tmp/tmp78htwu3y 0%| | 0/283 [00:00 type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/floriadmin/miniforge3/envs/mlc/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) 0%|▎ | 1/283 [00:20<1:36:34, 20.55s/it] [2024-03-18 19:18:47] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.20.input_layernorm.weight", shape: (2560,), dtype: float32 0%|▎ | 1/283 [00:20<1:36:34, 20.55s/it] [2024-03-18 19:18:47] INFO group_quantization.py:232: Compiling quantize function for key: ((2560, 6912), float32, cuda, axis=1, output_transpose=False) 0%|▎ | 1/283 [00:20<1:36:34, 20.55s/it] [2024-03-18 19:18:47] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 0%|▎ | 1/283 [00:21<1:36:34, 20.55s/it] [2024-03-18 19:18:48] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 0%|▎ | 1/283 [00:21<1:36:34, 20.55s/it] 1%|▉ | 3/283 [00:21<26:04, 5.59s/it] [2024-03-18 19:18:48] INFO group_quantization.py:232: Compiling quantize function for key: ((13824, 2560), float32, cuda, axis=1, output_transpose=False) 1%|▉ | 3/283 [00:21<26:04, 5.59s/it] [2024-03-18 19:18:48] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 1%|▉ | 3/283 [00:22<26:04, 5.59s/it] [2024-03-18 19:18:48] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 1%|▉ | 3/283 [00:22<26:04, 5.59s/it] 1%|█▎ | 4/283 [00:22<18:33, 3.99s/it] [2024-03-18 19:18:49] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.20.post_attention_layernorm.weight", shape: (2560,), dtype: float32 1%|█▎ | 4/283 [00:22<18:33, 3.99s/it] [2024-03-18 19:18:49] INFO group_quantization.py:232: Compiling quantize function for key: ((2560, 2560), float32, cuda, axis=1, output_transpose=False) 1%|█▎ | 4/283 [00:22<18:33, 3.99s/it] [2024-03-18 19:18:49] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 1%|█▎ | 4/283 [00:22<18:33, 3.99s/it] [2024-03-18 19:18:49] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 1%|█▎ | 4/283 [00:22<18:33, 3.99s/it] 2%|█▉ | 6/283 [00:22<09:55, 2.15s/it] [2024-03-18 19:18:49] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.21.input_layernorm.weight", shape: (2560,), dtype: float32 2%|█▉ | 6/283 [00:22<09:55, 2.15s/it] [2024-03-18 19:18:49] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 2%|█▉ | 6/283 [00:23<09:55, 2.15s/it] [2024-03-18 19:18:49] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 2%|█▉ | 6/283 [00:23<09:55, 2.15s/it] 3%|██▌ | 8/283 [00:23<06:00, 1.31s/it] [2024-03-18 19:18:50] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 3%|██▌ | 8/283 [00:23<06:00, 1.31s/it] [2024-03-18 19:18:50] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 3%|██▌ | 8/283 [00:23<06:00, 1.31s/it] 3%|██▉ | 9/283 [00:23<05:09, 1.13s/it] [2024-03-18 19:18:50] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.21.post_attention_layernorm.weight", shape: (2560,), dtype: float32 3%|██▉ | 9/283 [00:23<05:09, 1.13s/it] [2024-03-18 19:18:50] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.21.self_attn.c_attn.bias", shape: (7680,), dtype: float32 3%|██▉ | 9/283 [00:23<05:09, 1.13s/it] [2024-03-18 19:18:50] INFO group_quantization.py:232: Compiling quantize function for key: ((7680, 2560), float32, cuda, axis=1, output_transpose=False) 3%|██▉ | 9/283 [00:23<05:09, 1.13s/it] [2024-03-18 19:18:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 3%|██▉ | 9/283 [00:24<05:09, 1.13s/it] [2024-03-18 19:18:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 3%|██▉ | 9/283 [00:24<05:09, 1.13s/it] 4%|███▊ | 12/283 [00:24<03:12, 1.41it/s] [2024-03-18 19:18:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 4%|███▊ | 12/283 [00:24<03:12, 1.41it/s] [2024-03-18 19:18:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.21.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 4%|███▊ | 12/283 [00:24<03:12, 1.41it/s] [2024-03-18 19:18:51] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.22.input_layernorm.weight", shape: (2560,), dtype: float32 4%|███▊ | 12/283 [00:24<03:12, 1.41it/s] [2024-03-18 19:18:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 4%|███▊ | 12/283 [00:24<03:12, 1.41it/s] [2024-03-18 19:18:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 4%|███▊ | 12/283 [00:24<03:12, 1.41it/s] 5%|████▊ | 15/283 [00:24<02:03, 2.17it/s] [2024-03-18 19:18:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 5%|████▊ | 15/283 [00:25<02:03, 2.17it/s] [2024-03-18 19:18:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 5%|████▊ | 15/283 [00:25<02:03, 2.17it/s] 6%|█████▏ | 16/283 [00:25<02:06, 2.11it/s] [2024-03-18 19:18:52] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.22.post_attention_layernorm.weight", shape: (2560,), dtype: float32 6%|█████▏ | 16/283 [00:25<02:06, 2.11it/s] [2024-03-18 19:18:52] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.22.self_attn.c_attn.bias", shape: (7680,), dtype: float32 6%|█████▏ | 16/283 [00:25<02:06, 2.11it/s] [2024-03-18 19:18:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 6%|█████▏ | 16/283 [00:25<02:06, 2.11it/s] [2024-03-18 19:18:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 6%|█████▏ | 16/283 [00:25<02:06, 2.11it/s] 7%|██████ | 19/283 [00:25<01:25, 3.10it/s] [2024-03-18 19:18:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 7%|██████ | 19/283 [00:25<01:25, 3.10it/s] [2024-03-18 19:18:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.22.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 7%|██████ | 19/283 [00:25<01:25, 3.10it/s] [2024-03-18 19:18:52] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.23.input_layernorm.weight", shape: (2560,), dtype: float32 7%|██████ | 19/283 [00:25<01:25, 3.10it/s] [2024-03-18 19:18:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 7%|██████ | 19/283 [00:25<01:25, 3.10it/s] [2024-03-18 19:18:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 7%|██████ | 19/283 [00:25<01:25, 3.10it/s] 8%|███████ | 22/283 [00:25<01:03, 4.12it/s] [2024-03-18 19:18:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 8%|███████ | 22/283 [00:26<01:03, 4.12it/s] [2024-03-18 19:18:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 8%|███████ | 22/283 [00:26<01:03, 4.12it/s] 8%|███████▍ | 23/283 [00:26<01:14, 3.47it/s] [2024-03-18 19:18:53] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.23.post_attention_layernorm.weight", shape: (2560,), dtype: float32 8%|███████▍ | 23/283 [00:26<01:14, 3.47it/s] [2024-03-18 19:18:53] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.23.self_attn.c_attn.bias", shape: (7680,), dtype: float32 8%|███████▍ | 23/283 [00:26<01:14, 3.47it/s] [2024-03-18 19:18:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 8%|███████▍ | 23/283 [00:26<01:14, 3.47it/s] [2024-03-18 19:18:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 8%|███████▍ | 23/283 [00:26<01:14, 3.47it/s] 9%|████████▎ | 26/283 [00:26<00:56, 4.56it/s] [2024-03-18 19:18:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 9%|████████▎ | 26/283 [00:26<00:56, 4.56it/s] [2024-03-18 19:18:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.23.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 9%|████████▎ | 26/283 [00:26<00:56, 4.56it/s] [2024-03-18 19:18:53] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.24.input_layernorm.weight", shape: (2560,), dtype: float32 9%|████████▎ | 26/283 [00:26<00:56, 4.56it/s] [2024-03-18 19:18:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 9%|████████▎ | 26/283 [00:27<00:56, 4.56it/s] [2024-03-18 19:18:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 9%|████████▎ | 26/283 [00:27<00:56, 4.56it/s] 10%|█████████▎ | 29/283 [00:27<00:45, 5.54it/s] [2024-03-18 19:18:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 10%|█████████▎ | 29/283 [00:27<00:45, 5.54it/s] [2024-03-18 19:18:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 10%|█████████▎ | 29/283 [00:27<00:45, 5.54it/s] 11%|█████████▋ | 30/283 [00:27<00:58, 4.35it/s] [2024-03-18 19:18:54] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.24.post_attention_layernorm.weight", shape: (2560,), dtype: float32 11%|█████████▋ | 30/283 [00:27<00:58, 4.35it/s] [2024-03-18 19:18:54] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.24.self_attn.c_attn.bias", shape: (7680,), dtype: float32 11%|█████████▋ | 30/283 [00:27<00:58, 4.35it/s] [2024-03-18 19:18:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 11%|█████████▋ | 30/283 [00:27<00:58, 4.35it/s] [2024-03-18 19:18:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 11%|█████████▋ | 30/283 [00:27<00:58, 4.35it/s] 12%|██████████▌ | 33/283 [00:28<00:46, 5.43it/s] [2024-03-18 19:18:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 12%|██████████▌ | 33/283 [00:28<00:46, 5.43it/s] [2024-03-18 19:18:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.24.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 12%|██████████▌ | 33/283 [00:28<00:46, 5.43it/s] [2024-03-18 19:18:54] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.25.input_layernorm.weight", shape: (2560,), dtype: float32 12%|██████████▌ | 33/283 [00:28<00:46, 5.43it/s] [2024-03-18 19:18:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 12%|██████████▌ | 33/283 [00:28<00:46, 5.43it/s] [2024-03-18 19:18:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 12%|██████████▌ | 33/283 [00:28<00:46, 5.43it/s] 13%|███████████▌ | 36/283 [00:28<00:38, 6.35it/s] [2024-03-18 19:18:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 13%|███████████▌ | 36/283 [00:28<00:38, 6.35it/s] [2024-03-18 19:18:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 13%|███████████▌ | 36/283 [00:28<00:38, 6.35it/s] 13%|███████████▉ | 37/283 [00:28<00:51, 4.78it/s] [2024-03-18 19:18:55] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.25.post_attention_layernorm.weight", shape: (2560,), dtype: float32 13%|███████████▉ | 37/283 [00:28<00:51, 4.78it/s] [2024-03-18 19:18:55] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.25.self_attn.c_attn.bias", shape: (7680,), dtype: float32 13%|███████████▉ | 37/283 [00:28<00:51, 4.78it/s] [2024-03-18 19:18:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 13%|███████████▉ | 37/283 [00:29<00:51, 4.78it/s] [2024-03-18 19:18:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 13%|███████████▉ | 37/283 [00:29<00:51, 4.78it/s] 14%|████████████▊ | 40/283 [00:29<00:41, 5.87it/s] [2024-03-18 19:18:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 14%|████████████▊ | 40/283 [00:29<00:41, 5.87it/s] [2024-03-18 19:18:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.25.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 14%|████████████▊ | 40/283 [00:29<00:41, 5.87it/s] [2024-03-18 19:18:56] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.26.input_layernorm.weight", shape: (2560,), dtype: float32 14%|████████████▊ | 40/283 [00:29<00:41, 5.87it/s] [2024-03-18 19:18:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 14%|████████████▊ | 40/283 [00:29<00:41, 5.87it/s] [2024-03-18 19:18:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 14%|████████████▊ | 40/283 [00:29<00:41, 5.87it/s] 15%|█████████████▊ | 43/283 [00:29<00:35, 6.74it/s] [2024-03-18 19:18:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 15%|█████████████▊ | 43/283 [00:29<00:35, 6.74it/s] [2024-03-18 19:18:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 15%|█████████████▊ | 43/283 [00:30<00:35, 6.74it/s] 16%|██████████████▏ | 44/283 [00:30<00:48, 4.94it/s] [2024-03-18 19:18:56] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.26.post_attention_layernorm.weight", shape: (2560,), dtype: float32 16%|██████████████▏ | 44/283 [00:30<00:48, 4.94it/s] [2024-03-18 19:18:56] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.26.self_attn.c_attn.bias", shape: (7680,), dtype: float32 16%|██████████████▏ | 44/283 [00:30<00:48, 4.94it/s] [2024-03-18 19:18:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 16%|██████████████▏ | 44/283 [00:30<00:48, 4.94it/s] [2024-03-18 19:18:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 16%|██████████████▏ | 44/283 [00:30<00:48, 4.94it/s] 17%|███████████████ | 47/283 [00:30<00:39, 5.92it/s] [2024-03-18 19:18:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 17%|███████████████ | 47/283 [00:30<00:39, 5.92it/s] [2024-03-18 19:18:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.26.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 17%|███████████████ | 47/283 [00:30<00:39, 5.92it/s] [2024-03-18 19:18:57] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.27.input_layernorm.weight", shape: (2560,), dtype: float32 17%|███████████████ | 47/283 [00:30<00:39, 5.92it/s] [2024-03-18 19:18:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 17%|███████████████ | 47/283 [00:30<00:39, 5.92it/s] [2024-03-18 19:18:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 17%|███████████████ | 47/283 [00:30<00:39, 5.92it/s] 18%|████████████████ | 50/283 [00:30<00:34, 6.79it/s] [2024-03-18 19:18:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 18%|████████████████ | 50/283 [00:31<00:34, 6.79it/s] [2024-03-18 19:18:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 18%|████████████████ | 50/283 [00:31<00:34, 6.79it/s] 18%|████████████████▍ | 51/283 [00:31<00:46, 4.99it/s] [2024-03-18 19:18:57] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.27.post_attention_layernorm.weight", shape: (2560,), dtype: float32 18%|████████████████▍ | 51/283 [00:31<00:46, 4.99it/s] [2024-03-18 19:18:57] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.27.self_attn.c_attn.bias", shape: (7680,), dtype: float32 18%|████████████████▍ | 51/283 [00:31<00:46, 4.99it/s] [2024-03-18 19:18:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 18%|████████████████▍ | 51/283 [00:31<00:46, 4.99it/s] [2024-03-18 19:18:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 18%|████████████████▍ | 51/283 [00:31<00:46, 4.99it/s] 19%|█████████████████▎ | 54/283 [00:31<00:38, 6.03it/s] [2024-03-18 19:18:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 19%|█████████████████▎ | 54/283 [00:31<00:38, 6.03it/s] [2024-03-18 19:18:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.27.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 19%|█████████████████▎ | 54/283 [00:31<00:38, 6.03it/s] [2024-03-18 19:18:58] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.28.input_layernorm.weight", shape: (2560,), dtype: float32 19%|█████████████████▎ | 54/283 [00:31<00:38, 6.03it/s] [2024-03-18 19:18:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 19%|█████████████████▎ | 54/283 [00:31<00:38, 6.03it/s] [2024-03-18 19:18:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 19%|█████████████████▎ | 54/283 [00:31<00:38, 6.03it/s] 20%|██████████████████▎ | 57/283 [00:31<00:32, 6.85it/s] [2024-03-18 19:18:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 20%|██████████████████▎ | 57/283 [00:32<00:32, 6.85it/s] [2024-03-18 19:18:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 20%|██████████████████▎ | 57/283 [00:32<00:32, 6.85it/s] 20%|██████████████████▋ | 58/283 [00:32<00:45, 4.98it/s] [2024-03-18 19:18:59] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.28.post_attention_layernorm.weight", shape: (2560,), dtype: float32 20%|██████████████████▋ | 58/283 [00:32<00:45, 4.98it/s] [2024-03-18 19:18:59] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.28.self_attn.c_attn.bias", shape: (7680,), dtype: float32 20%|██████████████████▋ | 58/283 [00:32<00:45, 4.98it/s] [2024-03-18 19:18:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 20%|██████████████████▋ | 58/283 [00:32<00:45, 4.98it/s] [2024-03-18 19:18:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 20%|██████████████████▋ | 58/283 [00:32<00:45, 4.98it/s] 22%|███████████████████▌ | 61/283 [00:32<00:36, 6.05it/s] [2024-03-18 19:18:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 22%|███████████████████▌ | 61/283 [00:32<00:36, 6.05it/s] [2024-03-18 19:18:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.28.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 22%|███████████████████▌ | 61/283 [00:32<00:36, 6.05it/s] [2024-03-18 19:18:59] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.29.input_layernorm.weight", shape: (2560,), dtype: float32 22%|███████████████████▌ | 61/283 [00:32<00:36, 6.05it/s] [2024-03-18 19:18:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 22%|███████████████████▌ | 61/283 [00:33<00:36, 6.05it/s] [2024-03-18 19:18:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 22%|███████████████████▌ | 61/283 [00:33<00:36, 6.05it/s] 23%|████████████████████▌ | 64/283 [00:33<00:31, 6.86it/s] [2024-03-18 19:19:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 23%|████████████████████▌ | 64/283 [00:33<00:31, 6.86it/s] [2024-03-18 19:19:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 23%|████████████████████▌ | 64/283 [00:33<00:31, 6.86it/s] 23%|████████████████████▉ | 65/283 [00:33<00:43, 5.01it/s] [2024-03-18 19:19:00] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.29.post_attention_layernorm.weight", shape: (2560,), dtype: float32 23%|████████████████████▉ | 65/283 [00:33<00:43, 5.01it/s] [2024-03-18 19:19:00] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.29.self_attn.c_attn.bias", shape: (7680,), dtype: float32 23%|████████████████████▉ | 65/283 [00:33<00:43, 5.01it/s] [2024-03-18 19:19:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 23%|████████████████████▉ | 65/283 [00:33<00:43, 5.01it/s] [2024-03-18 19:19:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 23%|████████████████████▉ | 65/283 [00:33<00:43, 5.01it/s] 24%|█████████████████████▊ | 68/283 [00:33<00:35, 6.07it/s] [2024-03-18 19:19:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 24%|█████████████████████▊ | 68/283 [00:33<00:35, 6.07it/s] [2024-03-18 19:19:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.29.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 24%|█████████████████████▊ | 68/283 [00:33<00:35, 6.07it/s] [2024-03-18 19:19:00] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.30.input_layernorm.weight", shape: (2560,), dtype: float32 24%|█████████████████████▊ | 68/283 [00:33<00:35, 6.07it/s] [2024-03-18 19:19:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 24%|█████████████████████▊ | 68/283 [00:34<00:35, 6.07it/s] [2024-03-18 19:19:00] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 24%|█████████████████████▊ | 68/283 [00:34<00:35, 6.07it/s] 25%|██████████████████████▊ | 71/283 [00:34<00:30, 6.90it/s] [2024-03-18 19:19:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 25%|██████████████████████▊ | 71/283 [00:34<00:30, 6.90it/s] [2024-03-18 19:19:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 25%|██████████████████████▊ | 71/283 [00:34<00:30, 6.90it/s] 25%|███████████████████████▏ | 72/283 [00:34<00:41, 5.04it/s] [2024-03-18 19:19:01] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.30.post_attention_layernorm.weight", shape: (2560,), dtype: float32 25%|███████████████████████▏ | 72/283 [00:34<00:41, 5.04it/s] [2024-03-18 19:19:01] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.30.self_attn.c_attn.bias", shape: (7680,), dtype: float32 25%|███████████████████████▏ | 72/283 [00:34<00:41, 5.04it/s] [2024-03-18 19:19:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 25%|███████████████████████▏ | 72/283 [00:34<00:41, 5.04it/s] [2024-03-18 19:19:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 25%|███████████████████████▏ | 72/283 [00:35<00:41, 5.04it/s] 27%|████████████████████████ | 75/283 [00:35<00:34, 6.09it/s] [2024-03-18 19:19:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 27%|████████████████████████ | 75/283 [00:35<00:34, 6.09it/s] [2024-03-18 19:19:01] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.30.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 27%|████████████████████████ | 75/283 [00:35<00:34, 6.09it/s] [2024-03-18 19:19:01] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.31.input_layernorm.weight", shape: (2560,), dtype: float32 27%|████████████████████████ | 75/283 [00:35<00:34, 6.09it/s] [2024-03-18 19:19:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 27%|████████████████████████ | 75/283 [00:35<00:34, 6.09it/s] [2024-03-18 19:19:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 27%|████████████████████████ | 75/283 [00:35<00:34, 6.09it/s] 28%|█████████████████████████ | 78/283 [00:35<00:29, 6.93it/s] [2024-03-18 19:19:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 28%|█████████████████████████ | 78/283 [00:35<00:29, 6.93it/s] [2024-03-18 19:19:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 28%|█████████████████████████ | 78/283 [00:35<00:29, 6.93it/s] 28%|█████████████████████████▍ | 79/283 [00:35<00:40, 5.05it/s] [2024-03-18 19:19:02] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.31.post_attention_layernorm.weight", shape: (2560,), dtype: float32 28%|█████████████████████████▍ | 79/283 [00:35<00:40, 5.05it/s] [2024-03-18 19:19:02] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.31.self_attn.c_attn.bias", shape: (7680,), dtype: float32 28%|█████████████████████████▍ | 79/283 [00:35<00:40, 5.05it/s] [2024-03-18 19:19:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 28%|█████████████████████████▍ | 79/283 [00:36<00:40, 5.05it/s] [2024-03-18 19:19:02] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 28%|█████████████████████████▍ | 79/283 [00:36<00:40, 5.05it/s] 29%|██████████████████████████▎ | 82/283 [00:36<00:33, 6.09it/s] [2024-03-18 19:19:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 29%|██████████████████████████▎ | 82/283 [00:36<00:33, 6.09it/s] [2024-03-18 19:19:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.31.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 29%|██████████████████████████▎ | 82/283 [00:36<00:33, 6.09it/s] [2024-03-18 19:19:03] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.32.input_layernorm.weight", shape: (2560,), dtype: float32 29%|██████████████████████████▎ | 82/283 [00:36<00:33, 6.09it/s] [2024-03-18 19:19:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 29%|██████████████████████████▎ | 82/283 [00:36<00:33, 6.09it/s] [2024-03-18 19:19:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 29%|██████████████████████████▎ | 82/283 [00:36<00:33, 6.09it/s] 30%|███████████████████████████▎ | 85/283 [00:36<00:28, 6.89it/s] [2024-03-18 19:19:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 30%|███████████████████████████▎ | 85/283 [00:36<00:28, 6.89it/s] [2024-03-18 19:19:03] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 30%|███████████████████████████▎ | 85/283 [00:37<00:28, 6.89it/s] 30%|███████████████████████████▋ | 86/283 [00:37<00:39, 5.03it/s] [2024-03-18 19:19:03] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.32.post_attention_layernorm.weight", shape: (2560,), dtype: float32 30%|███████████████████████████▋ | 86/283 [00:37<00:39, 5.03it/s] [2024-03-18 19:19:03] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.32.self_attn.c_attn.bias", shape: (7680,), dtype: float32 30%|███████████████████████████▋ | 86/283 [00:37<00:39, 5.03it/s] [2024-03-18 19:19:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 30%|███████████████████████████▋ | 86/283 [00:37<00:39, 5.03it/s] [2024-03-18 19:19:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 30%|███████████████████████████▋ | 86/283 [00:37<00:39, 5.03it/s] 31%|████████████████████████████▌ | 89/283 [00:37<00:31, 6.10it/s] [2024-03-18 19:19:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 31%|████████████████████████████▌ | 89/283 [00:37<00:31, 6.10it/s] [2024-03-18 19:19:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.32.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 31%|████████████████████████████▌ | 89/283 [00:37<00:31, 6.10it/s] [2024-03-18 19:19:04] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.33.input_layernorm.weight", shape: (2560,), dtype: float32 31%|████████████████████████████▌ | 89/283 [00:37<00:31, 6.10it/s] [2024-03-18 19:19:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 31%|████████████████████████████▌ | 89/283 [00:37<00:31, 6.10it/s] [2024-03-18 19:19:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 31%|████████████████████████████▌ | 89/283 [00:37<00:31, 6.10it/s] 33%|█████████████████████████████▌ | 92/283 [00:37<00:27, 6.93it/s] [2024-03-18 19:19:04] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 33%|█████████████████████████████▌ | 92/283 [00:38<00:27, 6.93it/s] [2024-03-18 19:19:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 33%|█████████████████████████████▌ | 92/283 [00:38<00:27, 6.93it/s] 33%|█████████████████████████████▉ | 93/283 [00:38<00:37, 5.06it/s] [2024-03-18 19:19:05] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.33.post_attention_layernorm.weight", shape: (2560,), dtype: float32 33%|█████████████████████████████▉ | 93/283 [00:38<00:37, 5.06it/s] [2024-03-18 19:19:05] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.33.self_attn.c_attn.bias", shape: (7680,), dtype: float32 33%|█████████████████████████████▉ | 93/283 [00:38<00:37, 5.06it/s] [2024-03-18 19:19:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 33%|█████████████████████████████▉ | 93/283 [00:38<00:37, 5.06it/s] [2024-03-18 19:19:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 33%|█████████████████████████████▉ | 93/283 [00:38<00:37, 5.06it/s] 34%|██████████████████████████████▊ | 96/283 [00:38<00:30, 6.14it/s] [2024-03-18 19:19:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 34%|██████████████████████████████▊ | 96/283 [00:38<00:30, 6.14it/s] [2024-03-18 19:19:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.33.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 34%|██████████████████████████████▊ | 96/283 [00:38<00:30, 6.14it/s] [2024-03-18 19:19:05] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.34.input_layernorm.weight", shape: (2560,), dtype: float32 34%|██████████████████████████████▊ | 96/283 [00:38<00:30, 6.14it/s] [2024-03-18 19:19:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 34%|██████████████████████████████▊ | 96/283 [00:38<00:30, 6.14it/s] [2024-03-18 19:19:05] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 34%|██████████████████████████████▊ | 96/283 [00:38<00:30, 6.14it/s] 35%|███████████████████████████████▊ | 99/283 [00:38<00:26, 6.96it/s] [2024-03-18 19:19:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 35%|███████████████████████████████▊ | 99/283 [00:39<00:26, 6.96it/s] [2024-03-18 19:19:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 35%|███████████████████████████████▊ | 99/283 [00:39<00:26, 6.96it/s] 35%|███████████████████████████████▊ | 100/283 [00:39<00:36, 5.06it/s] [2024-03-18 19:19:06] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.34.post_attention_layernorm.weight", shape: (2560,), dtype: float32 35%|███████████████████████████████▊ | 100/283 [00:39<00:36, 5.06it/s] [2024-03-18 19:19:06] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.34.self_attn.c_attn.bias", shape: (7680,), dtype: float32 35%|███████████████████████████████▊ | 100/283 [00:39<00:36, 5.06it/s] [2024-03-18 19:19:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 35%|███████████████████████████████▊ | 100/283 [00:39<00:36, 5.06it/s] [2024-03-18 19:19:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 35%|███████████████████████████████▊ | 100/283 [00:39<00:36, 5.06it/s] 36%|████████████████████████████████▊ | 103/283 [00:39<00:29, 6.12it/s] [2024-03-18 19:19:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 36%|████████████████████████████████▊ | 103/283 [00:39<00:29, 6.12it/s] [2024-03-18 19:19:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.34.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 36%|████████████████████████████████▊ | 103/283 [00:39<00:29, 6.12it/s] [2024-03-18 19:19:06] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.35.input_layernorm.weight", shape: (2560,), dtype: float32 36%|████████████████████████████████▊ | 103/283 [00:39<00:29, 6.12it/s] [2024-03-18 19:19:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 36%|████████████████████████████████▊ | 103/283 [00:40<00:29, 6.12it/s] [2024-03-18 19:19:06] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 36%|████████████████████████████████▊ | 103/283 [00:40<00:29, 6.12it/s] 37%|█████████████████████████████████▋ | 106/283 [00:40<00:25, 6.95it/s] [2024-03-18 19:19:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 37%|█████████████████████████████████▋ | 106/283 [00:40<00:25, 6.95it/s] [2024-03-18 19:19:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 37%|█████████████████████████████████▋ | 106/283 [00:40<00:25, 6.95it/s] 38%|██████████████████████████████████ | 107/283 [00:40<00:34, 5.06it/s] [2024-03-18 19:19:07] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.35.post_attention_layernorm.weight", shape: (2560,), dtype: float32 38%|██████████████████████████████████ | 107/283 [00:40<00:34, 5.06it/s] [2024-03-18 19:19:07] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.35.self_attn.c_attn.bias", shape: (7680,), dtype: float32 38%|██████████████████████████████████ | 107/283 [00:40<00:34, 5.06it/s] [2024-03-18 19:19:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 38%|██████████████████████████████████ | 107/283 [00:40<00:34, 5.06it/s] [2024-03-18 19:19:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 38%|██████████████████████████████████ | 107/283 [00:40<00:34, 5.06it/s] 39%|██████████████████████████████████▉ | 110/283 [00:40<00:28, 6.09it/s] [2024-03-18 19:19:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 39%|██████████████████████████████████▉ | 110/283 [00:40<00:28, 6.09it/s] [2024-03-18 19:19:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.35.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 39%|██████████████████████████████████▉ | 110/283 [00:40<00:28, 6.09it/s] [2024-03-18 19:19:07] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.36.input_layernorm.weight", shape: (2560,), dtype: float32 39%|██████████████████████████████████▉ | 110/283 [00:40<00:28, 6.09it/s] [2024-03-18 19:19:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 39%|██████████████████████████████████▉ | 110/283 [00:41<00:28, 6.09it/s] [2024-03-18 19:19:07] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 39%|██████████████████████████████████▉ | 110/283 [00:41<00:28, 6.09it/s] 40%|███████████████████████████████████▉ | 113/283 [00:41<00:24, 6.87it/s] [2024-03-18 19:19:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 40%|███████████████████████████████████▉ | 113/283 [00:41<00:24, 6.87it/s] [2024-03-18 19:19:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 40%|███████████████████████████████████▉ | 113/283 [00:41<00:24, 6.87it/s] 40%|████████████████████████████████████▎ | 114/283 [00:41<00:34, 4.96it/s] [2024-03-18 19:19:08] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.36.post_attention_layernorm.weight", shape: (2560,), dtype: float32 40%|████████████████████████████████████▎ | 114/283 [00:41<00:34, 4.96it/s] [2024-03-18 19:19:08] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.36.self_attn.c_attn.bias", shape: (7680,), dtype: float32 40%|████████████████████████████████████▎ | 114/283 [00:41<00:34, 4.96it/s] [2024-03-18 19:19:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 40%|████████████████████████████████████▎ | 114/283 [00:42<00:34, 4.96it/s] [2024-03-18 19:19:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 40%|████████████████████████████████████▎ | 114/283 [00:42<00:34, 4.96it/s] 41%|█████████████████████████████████████▏ | 117/283 [00:42<00:27, 5.99it/s] [2024-03-18 19:19:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 41%|█████████████████████████████████████▏ | 117/283 [00:42<00:27, 5.99it/s] [2024-03-18 19:19:08] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.36.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 41%|█████████████████████████████████████▏ | 117/283 [00:42<00:27, 5.99it/s] [2024-03-18 19:19:08] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.37.input_layernorm.weight", shape: (2560,), dtype: float32 41%|█████████████████████████████████████▏ | 117/283 [00:42<00:27, 5.99it/s] 42%|█████████████████████████████████████▊ | 119/283 [00:42<00:22, 7.36it/s] [2024-03-18 19:19:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 42%|█████████████████████████████████████▊ | 119/283 [00:42<00:22, 7.36it/s] [2024-03-18 19:19:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 42%|█████████████████████████████████████▊ | 119/283 [00:42<00:22, 7.36it/s] [2024-03-18 19:19:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 42%|█████████████████████████████████████▊ | 119/283 [00:42<00:22, 7.36it/s] [2024-03-18 19:19:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 42%|█████████████████████████████████████▊ | 119/283 [00:42<00:22, 7.36it/s] 43%|██████████████████████████████████████▍ | 121/283 [00:42<00:32, 4.93it/s] [2024-03-18 19:19:09] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.37.post_attention_layernorm.weight", shape: (2560,), dtype: float32 43%|██████████████████████████████████████▍ | 121/283 [00:42<00:32, 4.93it/s] [2024-03-18 19:19:09] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.37.self_attn.c_attn.bias", shape: (7680,), dtype: float32 43%|██████████████████████████████████████▍ | 121/283 [00:42<00:32, 4.93it/s] [2024-03-18 19:19:09] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 43%|██████████████████████████████████████▍ | 121/283 [00:43<00:32, 4.93it/s] [2024-03-18 19:19:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 43%|██████████████████████████████████████▍ | 121/283 [00:43<00:32, 4.93it/s] 44%|███████████████████████████████████████▍ | 124/283 [00:43<00:26, 5.96it/s] [2024-03-18 19:19:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 44%|███████████████████████████████████████▍ | 124/283 [00:43<00:26, 5.96it/s] [2024-03-18 19:19:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.37.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 44%|███████████████████████████████████████▍ | 124/283 [00:43<00:26, 5.96it/s] [2024-03-18 19:19:10] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.38.input_layernorm.weight", shape: (2560,), dtype: float32 44%|███████████████████████████████████████▍ | 124/283 [00:43<00:26, 5.96it/s] [2024-03-18 19:19:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 44%|███████████████████████████████████████▍ | 124/283 [00:43<00:26, 5.96it/s] [2024-03-18 19:19:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 44%|███████████████████████████████████████▍ | 124/283 [00:43<00:26, 5.96it/s] 45%|████████████████████████████████████████▍ | 127/283 [00:43<00:23, 6.77it/s] [2024-03-18 19:19:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 45%|████████████████████████████████████████▍ | 127/283 [00:44<00:23, 6.77it/s] [2024-03-18 19:19:10] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 45%|████████████████████████████████████████▍ | 127/283 [00:44<00:23, 6.77it/s] 45%|████████████████████████████████████████▋ | 128/283 [00:44<00:31, 4.91it/s] [2024-03-18 19:19:10] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.38.post_attention_layernorm.weight", shape: (2560,), dtype: float32 45%|████████████████████████████████████████▋ | 128/283 [00:44<00:31, 4.91it/s] [2024-03-18 19:19:10] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.38.self_attn.c_attn.bias", shape: (7680,), dtype: float32 45%|████████████████████████████████████████▋ | 128/283 [00:44<00:31, 4.91it/s] [2024-03-18 19:19:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 45%|████████████████████████████████████████▋ | 128/283 [00:44<00:31, 4.91it/s] [2024-03-18 19:19:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 45%|████████████████████████████████████████▋ | 128/283 [00:44<00:31, 4.91it/s] 46%|█████████████████████████████████████████▋ | 131/283 [00:44<00:26, 5.82it/s] [2024-03-18 19:19:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 46%|█████████████████████████████████████████▋ | 131/283 [00:44<00:26, 5.82it/s] [2024-03-18 19:19:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.38.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 46%|█████████████████████████████████████████▋ | 131/283 [00:44<00:26, 5.82it/s] [2024-03-18 19:19:11] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.39.input_layernorm.weight", shape: (2560,), dtype: float32 46%|█████████████████████████████████████████▋ | 131/283 [00:44<00:26, 5.82it/s] [2024-03-18 19:19:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 46%|█████████████████████████████████████████▋ | 131/283 [00:44<00:26, 5.82it/s] [2024-03-18 19:19:11] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 46%|█████████████████████████████████████████▋ | 131/283 [00:44<00:26, 5.82it/s] 47%|██████████████████████████████████████████▌ | 134/283 [00:44<00:22, 6.68it/s] [2024-03-18 19:19:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 47%|██████████████████████████████████████████▌ | 134/283 [00:45<00:22, 6.68it/s] [2024-03-18 19:19:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 47%|██████████████████████████████████████████▌ | 134/283 [00:45<00:22, 6.68it/s] 48%|██████████████████████████████████████████▉ | 135/283 [00:45<00:30, 4.87it/s] [2024-03-18 19:19:12] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.39.post_attention_layernorm.weight", shape: (2560,), dtype: float32 48%|██████████████████████████████████████████▉ | 135/283 [00:45<00:30, 4.87it/s] [2024-03-18 19:19:12] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.39.self_attn.c_attn.bias", shape: (7680,), dtype: float32 48%|██████████████████████████████████████████▉ | 135/283 [00:45<00:30, 4.87it/s] [2024-03-18 19:19:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 48%|██████████████████████████████████████████▉ | 135/283 [00:45<00:30, 4.87it/s] [2024-03-18 19:19:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 48%|██████████████████████████████████████████▉ | 135/283 [00:45<00:30, 4.87it/s] 49%|███████████████████████████████████████████▉ | 138/283 [00:45<00:24, 5.95it/s] [2024-03-18 19:19:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 49%|███████████████████████████████████████████▉ | 138/283 [00:45<00:24, 5.95it/s] [2024-03-18 19:19:12] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.39.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 49%|███████████████████████████████████████████▉ | 138/283 [00:45<00:24, 5.95it/s] [2024-03-18 19:19:12] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.norm.weight", shape: (2560,), dtype: float32 49%|███████████████████████████████████████████▉ | 138/283 [00:45<00:24, 5.95it/s] [2024-03-18 19:19:12] INFO huggingface_loader.py:194: Unloading HF weight file: ../dist/models/Qwen1.5-4B/model-00002-of-00002.safetensors 49%|███████████████████████████████████████████▉ | 138/283 [00:45<00:24, 5.95it/s] [2024-03-18 19:19:13] INFO huggingface_loader.py:182: Loading HF parameters from: ../dist/models/Qwen1.5-4B/model-00001-of-00002.safetensors 49%|███████████████████████████████████████████▉ | 138/283 [00:46<00:24, 5.95it/s] [2024-03-18 19:19:34] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.embed_tokens.q_weight", shape: (151936, 320), dtype: uint32 49%|███████████████████████████████████████████▉ | 138/283 [01:07<00:24, 5.95it/s] [2024-03-18 19:19:35] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.embed_tokens.q_scale", shape: (151936, 64), dtype: float32 49%|███████████████████████████████████████████▉ | 138/283 [01:08<00:24, 5.95it/s] 50%|████████████████████████████████████████████▊ | 141/283 [01:08<06:38, 2.81s/it] [2024-03-18 19:19:35] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.0.input_layernorm.weight", shape: (2560,), dtype: float32 50%|████████████████████████████████████████████▊ | 141/283 [01:08<06:38, 2.81s/it] [2024-03-18 19:19:35] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.0.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 50%|████████████████████████████████████████████▊ | 141/283 [01:08<06:38, 2.81s/it] [2024-03-18 19:19:35] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.0.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 50%|████████████████████████████████████████████▊ | 141/283 [01:08<06:38, 2.81s/it] 51%|█████████████████████████████████████████████▍ | 143/283 [01:08<04:58, 2.14s/it] [2024-03-18 19:19:36] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.0.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 51%|█████████████████████████████████████████████▍ | 143/283 [01:09<04:58, 2.14s/it] [2024-03-18 19:19:36] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.0.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 51%|█████████████████████████████████████████████▍ | 143/283 [01:09<04:58, 2.14s/it] 51%|█████████████████████████████████████████████▊ | 144/283 [01:09<04:22, 1.89s/it] [2024-03-18 19:19:36] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.0.post_attention_layernorm.weight", shape: (2560,), dtype: float32 51%|█████████████████████████████████████████████▊ | 144/283 [01:09<04:22, 1.89s/it] [2024-03-18 19:19:36] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.0.self_attn.c_attn.bias", shape: (7680,), dtype: float32 51%|█████████████████████████████████████████████▊ | 144/283 [01:09<04:22, 1.89s/it] [2024-03-18 19:19:36] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.0.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 51%|█████████████████████████████████████████████▊ | 144/283 [01:09<04:22, 1.89s/it] [2024-03-18 19:19:36] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.0.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 51%|█████████████████████████████████████████████▊ | 144/283 [01:09<04:22, 1.89s/it] 52%|██████████████████████████████████████████████▋ | 147/283 [01:09<02:41, 1.19s/it] [2024-03-18 19:19:36] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.0.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 52%|██████████████████████████████████████████████▋ | 147/283 [01:09<02:41, 1.19s/it] [2024-03-18 19:19:36] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.0.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 52%|██████████████████████████████████████████████▋ | 147/283 [01:09<02:41, 1.19s/it] [2024-03-18 19:19:36] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.1.input_layernorm.weight", shape: (2560,), dtype: float32 52%|██████████████████████████████████████████████▋ | 147/283 [01:09<02:41, 1.19s/it] [2024-03-18 19:19:36] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.1.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 52%|██████████████████████████████████████████████▋ | 147/283 [01:10<02:41, 1.19s/it] [2024-03-18 19:19:36] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.1.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 52%|██████████████████████████████████████████████▋ | 147/283 [01:10<02:41, 1.19s/it] 53%|███████████████████████████████████████████████▋ | 150/283 [01:10<01:46, 1.25it/s] [2024-03-18 19:19:37] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.1.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 53%|███████████████████████████████████████████████▋ | 150/283 [01:10<01:46, 1.25it/s] [2024-03-18 19:19:37] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.1.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 53%|███████████████████████████████████████████████▋ | 150/283 [01:10<01:46, 1.25it/s] 53%|████████████████████████████████████████████████ | 151/283 [01:10<01:40, 1.32it/s] [2024-03-18 19:19:37] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.1.post_attention_layernorm.weight", shape: (2560,), dtype: float32 53%|████████████████████████████████████████████████ | 151/283 [01:10<01:40, 1.32it/s] [2024-03-18 19:19:37] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.1.self_attn.c_attn.bias", shape: (7680,), dtype: float32 53%|████████████████████████████████████████████████ | 151/283 [01:10<01:40, 1.32it/s] [2024-03-18 19:19:37] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.1.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 53%|████████████████████████████████████████████████ | 151/283 [01:10<01:40, 1.32it/s] [2024-03-18 19:19:37] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.1.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 53%|████████████████████████████████████████████████ | 151/283 [01:10<01:40, 1.32it/s] 54%|████████████████████████████████████████████████▉ | 154/283 [01:10<01:05, 1.97it/s] [2024-03-18 19:19:37] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.1.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 54%|████████████████████████████████████████████████▉ | 154/283 [01:11<01:05, 1.97it/s] [2024-03-18 19:19:37] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.1.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 54%|████████████████████████████████████████████████▉ | 154/283 [01:11<01:05, 1.97it/s] [2024-03-18 19:19:37] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.10.input_layernorm.weight", shape: (2560,), dtype: float32 54%|████████████████████████████████████████████████▉ | 154/283 [01:11<01:05, 1.97it/s] [2024-03-18 19:19:38] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.10.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 54%|████████████████████████████████████████████████▉ | 154/283 [01:11<01:05, 1.97it/s] [2024-03-18 19:19:38] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.10.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 54%|████████████████████████████████████████████████▉ | 154/283 [01:11<01:05, 1.97it/s] 55%|█████████████████████████████████████████████████▉ | 157/283 [01:11<00:46, 2.74it/s] [2024-03-18 19:19:38] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.10.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 55%|█████████████████████████████████████████████████▉ | 157/283 [01:11<00:46, 2.74it/s] [2024-03-18 19:19:38] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.10.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 55%|█████████████████████████████████████████████████▉ | 157/283 [01:11<00:46, 2.74it/s] 56%|██████████████████████████████████████████████████▏ | 158/283 [01:11<00:48, 2.58it/s] [2024-03-18 19:19:38] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.10.post_attention_layernorm.weight", shape: (2560,), dtype: float32 56%|██████████████████████████████████████████████████▏ | 158/283 [01:11<00:48, 2.58it/s] [2024-03-18 19:19:38] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.10.self_attn.c_attn.bias", shape: (7680,), dtype: float32 56%|██████████████████████████████████████████████████▏ | 158/283 [01:11<00:48, 2.58it/s] [2024-03-18 19:19:38] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.10.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 56%|██████████████████████████████████████████████████▏ | 158/283 [01:12<00:48, 2.58it/s] [2024-03-18 19:19:38] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.10.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 56%|██████████████████████████████████████████████████▏ | 158/283 [01:12<00:48, 2.58it/s] 57%|███████████████████████████████████████████████████▏ | 161/283 [01:12<00:34, 3.56it/s] [2024-03-18 19:19:39] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.10.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 57%|███████████████████████████████████████████████████▏ | 161/283 [01:12<00:34, 3.56it/s] [2024-03-18 19:19:39] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.10.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 57%|███████████████████████████████████████████████████▏ | 161/283 [01:12<00:34, 3.56it/s] [2024-03-18 19:19:39] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.11.input_layernorm.weight", shape: (2560,), dtype: float32 57%|███████████████████████████████████████████████████▏ | 161/283 [01:12<00:34, 3.56it/s] [2024-03-18 19:19:39] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.11.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 57%|███████████████████████████████████████████████████▏ | 161/283 [01:12<00:34, 3.56it/s] [2024-03-18 19:19:39] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.11.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 57%|███████████████████████████████████████████████████▏ | 161/283 [01:12<00:34, 3.56it/s] 58%|████████████████████████████████████████████████████▏ | 164/283 [01:12<00:26, 4.57it/s] [2024-03-18 19:19:39] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.11.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 58%|████████████████████████████████████████████████████▏ | 164/283 [01:12<00:26, 4.57it/s] [2024-03-18 19:19:39] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.11.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 58%|████████████████████████████████████████████████████▏ | 164/283 [01:12<00:26, 4.57it/s] 58%|████████████████████████████████████████████████████▍ | 165/283 [01:12<00:30, 3.83it/s] [2024-03-18 19:19:39] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.11.post_attention_layernorm.weight", shape: (2560,), dtype: float32 58%|████████████████████████████████████████████████████▍ | 165/283 [01:12<00:30, 3.83it/s] [2024-03-18 19:19:39] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.11.self_attn.c_attn.bias", shape: (7680,), dtype: float32 58%|████████████████████████████████████████████████████▍ | 165/283 [01:12<00:30, 3.83it/s] [2024-03-18 19:19:40] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.11.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 58%|████████████████████████████████████████████████████▍ | 165/283 [01:13<00:30, 3.83it/s] [2024-03-18 19:19:40] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.11.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 58%|████████████████████████████████████████████████████▍ | 165/283 [01:13<00:30, 3.83it/s] 59%|█████████████████████████████████████████████████████▍ | 168/283 [01:13<00:23, 4.94it/s] [2024-03-18 19:19:40] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.11.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 59%|█████████████████████████████████████████████████████▍ | 168/283 [01:13<00:23, 4.94it/s] [2024-03-18 19:19:40] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.11.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 59%|█████████████████████████████████████████████████████▍ | 168/283 [01:13<00:23, 4.94it/s] [2024-03-18 19:19:40] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.12.input_layernorm.weight", shape: (2560,), dtype: float32 59%|█████████████████████████████████████████████████████▍ | 168/283 [01:13<00:23, 4.94it/s] [2024-03-18 19:19:40] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.12.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 59%|█████████████████████████████████████████████████████▍ | 168/283 [01:13<00:23, 4.94it/s] [2024-03-18 19:19:40] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.12.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 59%|█████████████████████████████████████████████████████▍ | 168/283 [01:13<00:23, 4.94it/s] 60%|██████████████████████████████████████████████████████▍ | 171/283 [01:13<00:18, 5.90it/s] [2024-03-18 19:19:40] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.12.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 60%|██████████████████████████████████████████████████████▍ | 171/283 [01:14<00:18, 5.90it/s] [2024-03-18 19:19:40] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.12.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 60%|██████████████████████████████████████████████████████▍ | 171/283 [01:14<00:18, 5.90it/s] 61%|██████████████████████████████████████████████████████▋ | 172/283 [01:14<00:24, 4.57it/s] [2024-03-18 19:19:40] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.12.post_attention_layernorm.weight", shape: (2560,), dtype: float32 61%|██████████████████████████████████████████████████████▋ | 172/283 [01:14<00:24, 4.57it/s] [2024-03-18 19:19:40] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.12.self_attn.c_attn.bias", shape: (7680,), dtype: float32 61%|██████████████████████████████████████████████████████▋ | 172/283 [01:14<00:24, 4.57it/s] [2024-03-18 19:19:41] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.12.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 61%|██████████████████████████████████████████████████████▋ | 172/283 [01:14<00:24, 4.57it/s] [2024-03-18 19:19:41] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.12.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 61%|██████████████████████████████████████████████████████▋ | 172/283 [01:14<00:24, 4.57it/s] 62%|███████████████████████████████████████████████████████▋ | 175/283 [01:14<00:19, 5.66it/s] [2024-03-18 19:19:41] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.12.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 62%|███████████████████████████████████████████████████████▋ | 175/283 [01:14<00:19, 5.66it/s] [2024-03-18 19:19:41] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.12.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 62%|███████████████████████████████████████████████████████▋ | 175/283 [01:14<00:19, 5.66it/s] [2024-03-18 19:19:41] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.13.input_layernorm.weight", shape: (2560,), dtype: float32 62%|███████████████████████████████████████████████████████▋ | 175/283 [01:14<00:19, 5.66it/s] [2024-03-18 19:19:41] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.13.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 62%|███████████████████████████████████████████████████████▋ | 175/283 [01:14<00:19, 5.66it/s] [2024-03-18 19:19:41] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.13.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 62%|███████████████████████████████████████████████████████▋ | 175/283 [01:14<00:19, 5.66it/s] 63%|████████████████████████████████████████████████████████▌ | 178/283 [01:14<00:15, 6.57it/s] [2024-03-18 19:19:41] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.13.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 63%|████████████████████████████████████████████████████████▌ | 178/283 [01:15<00:15, 6.57it/s] [2024-03-18 19:19:42] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.13.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 63%|████████████████████████████████████████████████████████▌ | 178/283 [01:15<00:15, 6.57it/s] 63%|████████████████████████████████████████████████████████▉ | 179/283 [01:15<00:21, 4.90it/s] [2024-03-18 19:19:42] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.13.post_attention_layernorm.weight", shape: (2560,), dtype: float32 63%|████████████████████████████████████████████████████████▉ | 179/283 [01:15<00:21, 4.90it/s] [2024-03-18 19:19:42] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.13.self_attn.c_attn.bias", shape: (7680,), dtype: float32 63%|████████████████████████████████████████████████████████▉ | 179/283 [01:15<00:21, 4.90it/s] [2024-03-18 19:19:42] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.13.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 63%|████████████████████████████████████████████████████████▉ | 179/283 [01:15<00:21, 4.90it/s] [2024-03-18 19:19:42] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.13.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 63%|████████████████████████████████████████████████████████▉ | 179/283 [01:15<00:21, 4.90it/s] 64%|█████████████████████████████████████████████████████████▉ | 182/283 [01:15<00:16, 6.00it/s] [2024-03-18 19:19:42] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.13.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 64%|█████████████████████████████████████████████████████████▉ | 182/283 [01:15<00:16, 6.00it/s] [2024-03-18 19:19:42] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.13.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 64%|█████████████████████████████████████████████████████████▉ | 182/283 [01:15<00:16, 6.00it/s] [2024-03-18 19:19:42] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.14.input_layernorm.weight", shape: (2560,), dtype: float32 64%|█████████████████████████████████████████████████████████▉ | 182/283 [01:15<00:16, 6.00it/s] [2024-03-18 19:19:42] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.14.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 64%|█████████████████████████████████████████████████████████▉ | 182/283 [01:15<00:16, 6.00it/s] [2024-03-18 19:19:42] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.14.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 64%|█████████████████████████████████████████████████████████▉ | 182/283 [01:15<00:16, 6.00it/s] 65%|██████████████████████████████████████████████████████████▊ | 185/283 [01:15<00:14, 6.87it/s] [2024-03-18 19:19:43] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.14.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 65%|██████████████████████████████████████████████████████████▊ | 185/283 [01:16<00:14, 6.87it/s] [2024-03-18 19:19:43] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.14.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 65%|██████████████████████████████████████████████████████████▊ | 185/283 [01:16<00:14, 6.87it/s] 66%|███████████████████████████████████████████████████████████▏ | 186/283 [01:16<00:19, 5.04it/s] [2024-03-18 19:19:43] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.14.post_attention_layernorm.weight", shape: (2560,), dtype: float32 66%|███████████████████████████████████████████████████████████▏ | 186/283 [01:16<00:19, 5.04it/s] [2024-03-18 19:19:43] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.14.self_attn.c_attn.bias", shape: (7680,), dtype: float32 66%|███████████████████████████████████████████████████████████▏ | 186/283 [01:16<00:19, 5.04it/s] [2024-03-18 19:19:43] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.14.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 66%|███████████████████████████████████████████████████████████▏ | 186/283 [01:16<00:19, 5.04it/s] [2024-03-18 19:19:43] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.14.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 66%|███████████████████████████████████████████████████████████▏ | 186/283 [01:16<00:19, 5.04it/s] 67%|████████████████████████████████████████████████████████████ | 189/283 [01:16<00:15, 6.12it/s] [2024-03-18 19:19:43] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.14.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 67%|████████████████████████████████████████████████████████████ | 189/283 [01:16<00:15, 6.12it/s] [2024-03-18 19:19:43] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.14.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 67%|████████████████████████████████████████████████████████████ | 189/283 [01:16<00:15, 6.12it/s] [2024-03-18 19:19:43] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.15.input_layernorm.weight", shape: (2560,), dtype: float32 67%|████████████████████████████████████████████████████████████ | 189/283 [01:16<00:15, 6.12it/s] [2024-03-18 19:19:43] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.15.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 67%|████████████████████████████████████████████████████████████ | 189/283 [01:17<00:15, 6.12it/s] [2024-03-18 19:19:43] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.15.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 67%|████████████████████████████████████████████████████████████ | 189/283 [01:17<00:15, 6.12it/s] 68%|█████████████████████████████████████████████████████████████ | 192/283 [01:17<00:13, 6.98it/s] [2024-03-18 19:19:44] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.15.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 68%|█████████████████████████████████████████████████████████████ | 192/283 [01:17<00:13, 6.98it/s] [2024-03-18 19:19:44] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.15.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 68%|█████████████████████████████████████████████████████████████ | 192/283 [01:17<00:13, 6.98it/s] 68%|█████████████████████████████████████████████████████████████▍ | 193/283 [01:17<00:17, 5.10it/s] [2024-03-18 19:19:44] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.15.post_attention_layernorm.weight", shape: (2560,), dtype: float32 68%|█████████████████████████████████████████████████████████████▍ | 193/283 [01:17<00:17, 5.10it/s] [2024-03-18 19:19:44] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.15.self_attn.c_attn.bias", shape: (7680,), dtype: float32 68%|█████████████████████████████████████████████████████████████▍ | 193/283 [01:17<00:17, 5.10it/s] [2024-03-18 19:19:44] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.15.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 68%|█████████████████████████████████████████████████████████████▍ | 193/283 [01:17<00:17, 5.10it/s] [2024-03-18 19:19:44] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.15.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 68%|█████████████████████████████████████████████████████████████▍ | 193/283 [01:17<00:17, 5.10it/s] 69%|██████████████████████████████████████████████████████████████▎ | 196/283 [01:17<00:14, 6.18it/s] [2024-03-18 19:19:44] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.15.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 69%|██████████████████████████████████████████████████████████████▎ | 196/283 [01:18<00:14, 6.18it/s] [2024-03-18 19:19:44] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.15.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 69%|██████████████████████████████████████████████████████████████▎ | 196/283 [01:18<00:14, 6.18it/s] 70%|██████████████████████████████████████████████████████████████▋ | 197/283 [01:18<00:13, 6.56it/s] [2024-03-18 19:19:44] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.16.input_layernorm.weight", shape: (2560,), dtype: float32 70%|██████████████████████████████████████████████████████████████▋ | 197/283 [01:18<00:13, 6.56it/s] [2024-03-18 19:19:45] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.16.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 70%|██████████████████████████████████████████████████████████████▋ | 197/283 [01:18<00:13, 6.56it/s] [2024-03-18 19:19:45] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.16.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 70%|██████████████████████████████████████████████████████████████▋ | 197/283 [01:18<00:13, 6.56it/s] 70%|███████████████████████████████████████████████████████████████▎ | 199/283 [01:18<00:12, 6.98it/s] [2024-03-18 19:19:45] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.16.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 70%|███████████████████████████████████████████████████████████████▎ | 199/283 [01:18<00:12, 6.98it/s] [2024-03-18 19:19:45] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.16.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 70%|███████████████████████████████████████████████████████████████▎ | 199/283 [01:18<00:12, 6.98it/s] 71%|███████████████████████████████████████████████████████████████▌ | 200/283 [01:18<00:17, 4.76it/s] [2024-03-18 19:19:45] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.16.post_attention_layernorm.weight", shape: (2560,), dtype: float32 71%|███████████████████████████████████████████████████████████████▌ | 200/283 [01:18<00:17, 4.76it/s] [2024-03-18 19:19:45] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.16.self_attn.c_attn.bias", shape: (7680,), dtype: float32 71%|███████████████████████████████████████████████████████████████▌ | 200/283 [01:18<00:17, 4.76it/s] [2024-03-18 19:19:45] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.16.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 71%|███████████████████████████████████████████████████████████████▌ | 200/283 [01:19<00:17, 4.76it/s] [2024-03-18 19:19:45] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.16.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 71%|███████████████████████████████████████████████████████████████▌ | 200/283 [01:19<00:17, 4.76it/s] 72%|████████████████████████████████████████████████████████████████▌ | 203/283 [01:19<00:13, 6.06it/s] [2024-03-18 19:19:45] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.16.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 72%|████████████████████████████████████████████████████████████████▌ | 203/283 [01:19<00:13, 6.06it/s] [2024-03-18 19:19:45] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.16.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 72%|████████████████████████████████████████████████████████████████▌ | 203/283 [01:19<00:13, 6.06it/s] [2024-03-18 19:19:45] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.17.input_layernorm.weight", shape: (2560,), dtype: float32 72%|████████████████████████████████████████████████████████████████▌ | 203/283 [01:19<00:13, 6.06it/s] [2024-03-18 19:19:46] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.17.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 72%|████████████████████████████████████████████████████████████████▌ | 203/283 [01:19<00:13, 6.06it/s] [2024-03-18 19:19:46] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.17.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 72%|████████████████████████████████████████████████████████████████▌ | 203/283 [01:19<00:13, 6.06it/s] 73%|█████████████████████████████████████████████████████████████████▌ | 206/283 [01:19<00:11, 6.96it/s] [2024-03-18 19:19:46] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.17.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 73%|█████████████████████████████████████████████████████████████████▌ | 206/283 [01:19<00:11, 6.96it/s] [2024-03-18 19:19:46] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.17.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 73%|█████████████████████████████████████████████████████████████████▌ | 206/283 [01:19<00:11, 6.96it/s] 73%|█████████████████████████████████████████████████████████████████▊ | 207/283 [01:19<00:15, 4.95it/s] [2024-03-18 19:19:46] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.17.post_attention_layernorm.weight", shape: (2560,), dtype: float32 73%|█████████████████████████████████████████████████████████████████▊ | 207/283 [01:19<00:15, 4.95it/s] [2024-03-18 19:19:46] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.17.self_attn.c_attn.bias", shape: (7680,), dtype: float32 73%|█████████████████████████████████████████████████████████████████▊ | 207/283 [01:19<00:15, 4.95it/s] [2024-03-18 19:19:46] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.17.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 73%|█████████████████████████████████████████████████████████████████▊ | 207/283 [01:20<00:15, 4.95it/s] [2024-03-18 19:19:47] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.17.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 73%|█████████████████████████████████████████████████████████████████▊ | 207/283 [01:20<00:15, 4.95it/s] 74%|██████████████████████████████████████████████████████████████████▊ | 210/283 [01:20<00:12, 6.07it/s] [2024-03-18 19:19:47] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.17.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 74%|██████████████████████████████████████████████████████████████████▊ | 210/283 [01:20<00:12, 6.07it/s] [2024-03-18 19:19:47] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.17.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 74%|██████████████████████████████████████████████████████████████████▊ | 210/283 [01:20<00:12, 6.07it/s] [2024-03-18 19:19:47] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.18.input_layernorm.weight", shape: (2560,), dtype: float32 74%|██████████████████████████████████████████████████████████████████▊ | 210/283 [01:20<00:12, 6.07it/s] [2024-03-18 19:19:47] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.18.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 74%|██████████████████████████████████████████████████████████████████▊ | 210/283 [01:20<00:12, 6.07it/s] [2024-03-18 19:19:47] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.18.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 74%|██████████████████████████████████████████████████████████████████▊ | 210/283 [01:20<00:12, 6.07it/s] 75%|███████████████████████████████████████████████████████████████████▋ | 213/283 [01:20<00:10, 6.93it/s] [2024-03-18 19:19:47] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.18.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 75%|███████████████████████████████████████████████████████████████████▋ | 213/283 [01:21<00:10, 6.93it/s] [2024-03-18 19:19:47] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.18.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 75%|███████████████████████████████████████████████████████████████████▋ | 213/283 [01:21<00:10, 6.93it/s] 76%|████████████████████████████████████████████████████████████████████ | 214/283 [01:21<00:13, 5.02it/s] [2024-03-18 19:19:47] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.18.post_attention_layernorm.weight", shape: (2560,), dtype: float32 76%|████████████████████████████████████████████████████████████████████ | 214/283 [01:21<00:13, 5.02it/s] [2024-03-18 19:19:47] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.18.self_attn.c_attn.bias", shape: (7680,), dtype: float32 76%|████████████████████████████████████████████████████████████████████ | 214/283 [01:21<00:13, 5.02it/s] [2024-03-18 19:19:48] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.18.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 76%|████████████████████████████████████████████████████████████████████ | 214/283 [01:21<00:13, 5.02it/s] [2024-03-18 19:19:48] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.18.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 76%|████████████████████████████████████████████████████████████████████ | 214/283 [01:21<00:13, 5.02it/s] 77%|█████████████████████████████████████████████████████████████████████ | 217/283 [01:21<00:10, 6.14it/s] [2024-03-18 19:19:48] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.18.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 77%|█████████████████████████████████████████████████████████████████████ | 217/283 [01:21<00:10, 6.14it/s] [2024-03-18 19:19:48] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.18.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 77%|█████████████████████████████████████████████████████████████████████ | 217/283 [01:21<00:10, 6.14it/s] [2024-03-18 19:19:48] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.19.input_layernorm.weight", shape: (2560,), dtype: float32 77%|█████████████████████████████████████████████████████████████████████ | 217/283 [01:21<00:10, 6.14it/s] [2024-03-18 19:19:48] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.19.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 77%|█████████████████████████████████████████████████████████████████████ | 217/283 [01:21<00:10, 6.14it/s] [2024-03-18 19:19:48] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.19.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 77%|█████████████████████████████████████████████████████████████████████ | 217/283 [01:21<00:10, 6.14it/s] 78%|█████████████████████████████████████████████████████████████████████▉ | 220/283 [01:21<00:09, 6.99it/s] [2024-03-18 19:19:48] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.19.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 78%|█████████████████████████████████████████████████████████████████████▉ | 220/283 [01:22<00:09, 6.99it/s] [2024-03-18 19:19:49] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.19.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 78%|█████████████████████████████████████████████████████████████████████▉ | 220/283 [01:22<00:09, 6.99it/s] 78%|██████████████████████████████████████████████████████████████████████▎ | 221/283 [01:22<00:12, 5.08it/s] [2024-03-18 19:19:49] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.19.post_attention_layernorm.weight", shape: (2560,), dtype: float32 78%|██████████████████████████████████████████████████████████████████████▎ | 221/283 [01:22<00:12, 5.08it/s] [2024-03-18 19:19:49] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.19.self_attn.c_attn.bias", shape: (7680,), dtype: float32 78%|██████████████████████████████████████████████████████████████████████▎ | 221/283 [01:22<00:12, 5.08it/s] [2024-03-18 19:19:49] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.19.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 78%|██████████████████████████████████████████████████████████████████████▎ | 221/283 [01:22<00:12, 5.08it/s] [2024-03-18 19:19:49] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.19.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 78%|██████████████████████████████████████████████████████████████████████▎ | 221/283 [01:22<00:12, 5.08it/s] 79%|███████████████████████████████████████████████████████████████████████▏ | 224/283 [01:22<00:09, 6.17it/s] [2024-03-18 19:19:49] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.19.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 79%|███████████████████████████████████████████████████████████████████████▏ | 224/283 [01:22<00:09, 6.17it/s] [2024-03-18 19:19:49] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.19.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 79%|███████████████████████████████████████████████████████████████████████▏ | 224/283 [01:22<00:09, 6.17it/s] [2024-03-18 19:19:49] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.2.input_layernorm.weight", shape: (2560,), dtype: float32 79%|███████████████████████████████████████████████████████████████████████▏ | 224/283 [01:22<00:09, 6.17it/s] [2024-03-18 19:19:49] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.2.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 79%|███████████████████████████████████████████████████████████████████████▏ | 224/283 [01:22<00:09, 6.17it/s] [2024-03-18 19:19:49] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.2.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 79%|███████████████████████████████████████████████████████████████████████▏ | 224/283 [01:22<00:09, 6.17it/s] 80%|████████████████████████████████████████████████████████████████████████▏ | 227/283 [01:22<00:07, 7.00it/s] [2024-03-18 19:19:50] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.2.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 80%|████████████████████████████████████████████████████████████████████████▏ | 227/283 [01:23<00:07, 7.00it/s] [2024-03-18 19:19:50] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.2.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 80%|████████████████████████████████████████████████████████████████████████▏ | 227/283 [01:23<00:07, 7.00it/s] 81%|████████████████████████████████████████████████████████████████████████▌ | 228/283 [01:23<00:10, 5.10it/s] [2024-03-18 19:19:50] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.2.post_attention_layernorm.weight", shape: (2560,), dtype: float32 81%|████████████████████████████████████████████████████████████████████████▌ | 228/283 [01:23<00:10, 5.10it/s] [2024-03-18 19:19:50] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.2.self_attn.c_attn.bias", shape: (7680,), dtype: float32 81%|████████████████████████████████████████████████████████████████████████▌ | 228/283 [01:23<00:10, 5.10it/s] [2024-03-18 19:19:50] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.2.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 81%|████████████████████████████████████████████████████████████████████████▌ | 228/283 [01:23<00:10, 5.10it/s] [2024-03-18 19:19:50] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.2.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 81%|████████████████████████████████████████████████████████████████████████▌ | 228/283 [01:23<00:10, 5.10it/s] 82%|█████████████████████████████████████████████████████████████████████████▍ | 231/283 [01:23<00:08, 6.18it/s] [2024-03-18 19:19:50] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.2.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 82%|█████████████████████████████████████████████████████████████████████████▍ | 231/283 [01:23<00:08, 6.18it/s] [2024-03-18 19:19:50] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.2.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 82%|█████████████████████████████████████████████████████████████████████████▍ | 231/283 [01:23<00:08, 6.18it/s] [2024-03-18 19:19:50] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.20.self_attn.c_attn.bias", shape: (7680,), dtype: float32 82%|█████████████████████████████████████████████████████████████████████████▍ | 231/283 [01:23<00:08, 6.18it/s] [2024-03-18 19:19:50] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 82%|█████████████████████████████████████████████████████████████████████████▍ | 231/283 [01:24<00:08, 6.18it/s] [2024-03-18 19:19:50] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.20.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 82%|█████████████████████████████████████████████████████████████████████████▍ | 231/283 [01:24<00:08, 6.18it/s] 83%|██████████████████████████████████████████████████████████████████████████▍ | 234/283 [01:24<00:07, 6.83it/s] [2024-03-18 19:19:50] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.3.input_layernorm.weight", shape: (2560,), dtype: float32 83%|██████████████████████████████████████████████████████████████████████████▍ | 234/283 [01:24<00:07, 6.83it/s] [2024-03-18 19:19:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.3.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 83%|██████████████████████████████████████████████████████████████████████████▍ | 234/283 [01:24<00:07, 6.83it/s] [2024-03-18 19:19:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.3.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 83%|██████████████████████████████████████████████████████████████████████████▍ | 234/283 [01:24<00:07, 6.83it/s] 83%|███████████████████████████████████████████████████████████████████████████ | 236/283 [01:24<00:06, 6.81it/s] [2024-03-18 19:19:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.3.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 83%|███████████████████████████████████████████████████████████████████████████ | 236/283 [01:24<00:06, 6.81it/s] [2024-03-18 19:19:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.3.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 83%|███████████████████████████████████████████████████████████████████████████ | 236/283 [01:24<00:06, 6.81it/s] 84%|███████████████████████████████████████████████████████████████████████████▎ | 237/283 [01:24<00:09, 5.05it/s] [2024-03-18 19:19:51] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.3.post_attention_layernorm.weight", shape: (2560,), dtype: float32 84%|███████████████████████████████████████████████████████████████████████████▎ | 237/283 [01:24<00:09, 5.05it/s] [2024-03-18 19:19:51] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.3.self_attn.c_attn.bias", shape: (7680,), dtype: float32 84%|███████████████████████████████████████████████████████████████████████████▎ | 237/283 [01:24<00:09, 5.05it/s] [2024-03-18 19:19:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.3.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 84%|███████████████████████████████████████████████████████████████████████████▎ | 237/283 [01:25<00:09, 5.05it/s] [2024-03-18 19:19:51] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.3.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 84%|███████████████████████████████████████████████████████████████████████████▎ | 237/283 [01:25<00:09, 5.05it/s] 85%|████████████████████████████████████████████████████████████████████████████▎ | 240/283 [01:25<00:07, 6.11it/s] [2024-03-18 19:19:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.3.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 85%|████████████████████████████████████████████████████████████████████████████▎ | 240/283 [01:25<00:07, 6.11it/s] [2024-03-18 19:19:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.3.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 85%|████████████████████████████████████████████████████████████████████████████▎ | 240/283 [01:25<00:07, 6.11it/s] [2024-03-18 19:19:52] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.4.input_layernorm.weight", shape: (2560,), dtype: float32 85%|████████████████████████████████████████████████████████████████████████████▎ | 240/283 [01:25<00:07, 6.11it/s] [2024-03-18 19:19:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.4.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 85%|████████████████████████████████████████████████████████████████████████████▎ | 240/283 [01:25<00:07, 6.11it/s] [2024-03-18 19:19:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.4.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 85%|████████████████████████████████████████████████████████████████████████████▎ | 240/283 [01:25<00:07, 6.11it/s] 86%|█████████████████████████████████████████████████████████████████████████████▎ | 243/283 [01:25<00:05, 6.99it/s] [2024-03-18 19:19:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.4.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 86%|█████████████████████████████████████████████████████████████████████████████▎ | 243/283 [01:25<00:05, 6.99it/s] [2024-03-18 19:19:52] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.4.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 86%|█████████████████████████████████████████████████████████████████████████████▎ | 243/283 [01:26<00:05, 6.99it/s] 86%|█████████████████████████████████████████████████████████████████████████████▌ | 244/283 [01:26<00:07, 5.04it/s] [2024-03-18 19:19:52] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.4.post_attention_layernorm.weight", shape: (2560,), dtype: float32 86%|█████████████████████████████████████████████████████████████████████████████▌ | 244/283 [01:26<00:07, 5.04it/s] [2024-03-18 19:19:52] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.4.self_attn.c_attn.bias", shape: (7680,), dtype: float32 86%|█████████████████████████████████████████████████████████████████████████████▌ | 244/283 [01:26<00:07, 5.04it/s] [2024-03-18 19:19:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.4.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 86%|█████████████████████████████████████████████████████████████████████████████▌ | 244/283 [01:26<00:07, 5.04it/s] [2024-03-18 19:19:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.4.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 86%|█████████████████████████████████████████████████████████████████████████████▌ | 244/283 [01:26<00:07, 5.04it/s] 87%|██████████████████████████████████████████████████████████████████████████████▌ | 247/283 [01:26<00:05, 6.02it/s] [2024-03-18 19:19:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.4.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 87%|██████████████████████████████████████████████████████████████████████████████▌ | 247/283 [01:26<00:05, 6.02it/s] [2024-03-18 19:19:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.4.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 87%|██████████████████████████████████████████████████████████████████████████████▌ | 247/283 [01:26<00:05, 6.02it/s] [2024-03-18 19:19:53] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.5.input_layernorm.weight", shape: (2560,), dtype: float32 87%|██████████████████████████████████████████████████████████████████████████████▌ | 247/283 [01:26<00:05, 6.02it/s] [2024-03-18 19:19:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.5.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 87%|██████████████████████████████████████████████████████████████████████████████▌ | 247/283 [01:26<00:05, 6.02it/s] [2024-03-18 19:19:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.5.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 87%|██████████████████████████████████████████████████████████████████████████████▌ | 247/283 [01:26<00:05, 6.02it/s] 88%|███████████████████████████████████████████████████████████████████████████████▌ | 250/283 [01:26<00:04, 6.89it/s] [2024-03-18 19:19:53] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.5.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 88%|███████████████████████████████████████████████████████████████████████████████▌ | 250/283 [01:27<00:04, 6.89it/s] [2024-03-18 19:19:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.5.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 88%|███████████████████████████████████████████████████████████████████████████████▌ | 250/283 [01:27<00:04, 6.89it/s] 89%|███████████████████████████████████████████████████████████████████████████████▊ | 251/283 [01:27<00:06, 5.03it/s] [2024-03-18 19:19:54] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.5.post_attention_layernorm.weight", shape: (2560,), dtype: float32 89%|███████████████████████████████████████████████████████████████████████████████▊ | 251/283 [01:27<00:06, 5.03it/s] [2024-03-18 19:19:54] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.5.self_attn.c_attn.bias", shape: (7680,), dtype: float32 89%|███████████████████████████████████████████████████████████████████████████████▊ | 251/283 [01:27<00:06, 5.03it/s] [2024-03-18 19:19:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.5.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 89%|███████████████████████████████████████████████████████████████████████████████▊ | 251/283 [01:27<00:06, 5.03it/s] [2024-03-18 19:19:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.5.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 89%|███████████████████████████████████████████████████████████████████████████████▊ | 251/283 [01:27<00:06, 5.03it/s] 90%|████████████████████████████████████████████████████████████████████████████████▊ | 254/283 [01:27<00:04, 6.12it/s] [2024-03-18 19:19:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.5.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 90%|████████████████████████████████████████████████████████████████████████████████▊ | 254/283 [01:27<00:04, 6.12it/s] [2024-03-18 19:19:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.5.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 90%|████████████████████████████████████████████████████████████████████████████████▊ | 254/283 [01:27<00:04, 6.12it/s] [2024-03-18 19:19:54] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.6.input_layernorm.weight", shape: (2560,), dtype: float32 90%|████████████████████████████████████████████████████████████████████████████████▊ | 254/283 [01:27<00:04, 6.12it/s] [2024-03-18 19:19:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.6.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 90%|████████████████████████████████████████████████████████████████████████████████▊ | 254/283 [01:27<00:04, 6.12it/s] [2024-03-18 19:19:54] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.6.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 90%|████████████████████████████████████████████████████████████████████████████████▊ | 254/283 [01:27<00:04, 6.12it/s] 91%|█████████████████████████████████████████████████████████████████████████████████▋ | 257/283 [01:27<00:03, 6.90it/s] [2024-03-18 19:19:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.6.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 91%|█████████████████████████████████████████████████████████████████████████████████▋ | 257/283 [01:28<00:03, 6.90it/s] [2024-03-18 19:19:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.6.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 91%|█████████████████████████████████████████████████████████████████████████████████▋ | 257/283 [01:28<00:03, 6.90it/s] 91%|██████████████████████████████████████████████████████████████████████████████████ | 258/283 [01:28<00:04, 5.05it/s] [2024-03-18 19:19:55] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.6.post_attention_layernorm.weight", shape: (2560,), dtype: float32 91%|██████████████████████████████████████████████████████████████████████████████████ | 258/283 [01:28<00:04, 5.05it/s] [2024-03-18 19:19:55] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.6.self_attn.c_attn.bias", shape: (7680,), dtype: float32 91%|██████████████████████████████████████████████████████████████████████████████████ | 258/283 [01:28<00:04, 5.05it/s] [2024-03-18 19:19:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.6.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 91%|██████████████████████████████████████████████████████████████████████████████████ | 258/283 [01:28<00:04, 5.05it/s] [2024-03-18 19:19:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.6.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 91%|██████████████████████████████████████████████████████████████████████████████████ | 258/283 [01:28<00:04, 5.05it/s] 92%|███████████████████████████████████████████████████████████████████████████████████ | 261/283 [01:28<00:03, 6.13it/s] [2024-03-18 19:19:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.6.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 92%|███████████████████████████████████████████████████████████████████████████████████ | 261/283 [01:28<00:03, 6.13it/s] [2024-03-18 19:19:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.6.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 92%|███████████████████████████████████████████████████████████████████████████████████ | 261/283 [01:28<00:03, 6.13it/s] [2024-03-18 19:19:55] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.7.input_layernorm.weight", shape: (2560,), dtype: float32 92%|███████████████████████████████████████████████████████████████████████████████████ | 261/283 [01:28<00:03, 6.13it/s] [2024-03-18 19:19:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.7.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 92%|███████████████████████████████████████████████████████████████████████████████████ | 261/283 [01:29<00:03, 6.13it/s] [2024-03-18 19:19:55] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.7.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 92%|███████████████████████████████████████████████████████████████████████████████████ | 261/283 [01:29<00:03, 6.13it/s] 93%|███████████████████████████████████████████████████████████████████████████████████▉ | 264/283 [01:29<00:02, 6.96it/s] [2024-03-18 19:19:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.7.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 93%|███████████████████████████████████████████████████████████████████████████████████▉ | 264/283 [01:29<00:02, 6.96it/s] [2024-03-18 19:19:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.7.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 93%|███████████████████████████████████████████████████████████████████████████████████▉ | 264/283 [01:29<00:02, 6.96it/s] 94%|████████████████████████████████████████████████████████████████████████████████████▎ | 265/283 [01:29<00:03, 5.10it/s] [2024-03-18 19:19:56] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.7.post_attention_layernorm.weight", shape: (2560,), dtype: float32 94%|████████████████████████████████████████████████████████████████████████████████████▎ | 265/283 [01:29<00:03, 5.10it/s] [2024-03-18 19:19:56] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.7.self_attn.c_attn.bias", shape: (7680,), dtype: float32 94%|████████████████████████████████████████████████████████████████████████████████████▎ | 265/283 [01:29<00:03, 5.10it/s] [2024-03-18 19:19:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.7.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 94%|████████████████████████████████████████████████████████████████████████████████████▎ | 265/283 [01:29<00:03, 5.10it/s] [2024-03-18 19:19:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.7.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 94%|████████████████████████████████████████████████████████████████████████████████████▎ | 265/283 [01:29<00:03, 5.10it/s] 95%|█████████████████████████████████████████████████████████████████████████████████████▏ | 268/283 [01:29<00:02, 6.18it/s] [2024-03-18 19:19:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.7.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 95%|█████████████████████████████████████████████████████████████████████████████████████▏ | 268/283 [01:29<00:02, 6.18it/s] [2024-03-18 19:19:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.7.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 95%|█████████████████████████████████████████████████████████████████████████████████████▏ | 268/283 [01:29<00:02, 6.18it/s] [2024-03-18 19:19:56] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.8.input_layernorm.weight", shape: (2560,), dtype: float32 95%|█████████████████████████████████████████████████████████████████████████████████████▏ | 268/283 [01:29<00:02, 6.18it/s] [2024-03-18 19:19:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.8.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 95%|█████████████████████████████████████████████████████████████████████████████████████▏ | 268/283 [01:30<00:02, 6.18it/s] [2024-03-18 19:19:56] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.8.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 95%|█████████████████████████████████████████████████████████████████████████████████████▏ | 268/283 [01:30<00:02, 6.18it/s] 96%|██████████████████████████████████████████████████████████████████████████████████████▏ | 271/283 [01:30<00:01, 7.01it/s] [2024-03-18 19:19:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.8.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 96%|██████████████████████████████████████████████████████████████████████████████████████▏ | 271/283 [01:30<00:01, 7.01it/s] [2024-03-18 19:19:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.8.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 96%|██████████████████████████████████████████████████████████████████████████████████████▏ | 271/283 [01:30<00:01, 7.01it/s] 96%|██████████████████████████████████████████████████████████████████████████████████████▌ | 272/283 [01:30<00:02, 5.08it/s] [2024-03-18 19:19:57] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.8.post_attention_layernorm.weight", shape: (2560,), dtype: float32 96%|██████████████████████████████████████████████████████████████████████████████████████▌ | 272/283 [01:30<00:02, 5.08it/s] [2024-03-18 19:19:57] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.8.self_attn.c_attn.bias", shape: (7680,), dtype: float32 96%|██████████████████████████████████████████████████████████████████████████████████████▌ | 272/283 [01:30<00:02, 5.08it/s] [2024-03-18 19:19:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.8.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 96%|██████████████████████████████████████████████████████████████████████████████████████▌ | 272/283 [01:30<00:02, 5.08it/s] [2024-03-18 19:19:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.8.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 96%|██████████████████████████████████████████████████████████████████████████████████████▌ | 272/283 [01:31<00:02, 5.08it/s] 97%|███████████████████████████████████████████████████████████████████████████████████████▍ | 275/283 [01:31<00:01, 6.12it/s] [2024-03-18 19:19:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.8.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 97%|███████████████████████████████████████████████████████████████████████████████████████▍ | 275/283 [01:31<00:01, 6.12it/s] [2024-03-18 19:19:57] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.8.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 97%|███████████████████████████████████████████████████████████████████████████████████████▍ | 275/283 [01:31<00:01, 6.12it/s] [2024-03-18 19:19:57] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.9.input_layernorm.weight", shape: (2560,), dtype: float32 97%|███████████████████████████████████████████████████████████████████████████████████████▍ | 275/283 [01:31<00:01, 6.12it/s] [2024-03-18 19:19:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.9.mlp.down_proj.q_weight", shape: (2560, 865), dtype: uint32 97%|███████████████████████████████████████████████████████████████████████████████████████▍ | 275/283 [01:31<00:01, 6.12it/s] [2024-03-18 19:19:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.9.mlp.down_proj.q_scale", shape: (2560, 173), dtype: float32 97%|███████████████████████████████████████████████████████████████████████████████████████▍ | 275/283 [01:31<00:01, 6.12it/s] 98%|████████████████████████████████████████████████████████████████████████████████████████▍ | 278/283 [01:31<00:00, 6.95it/s] [2024-03-18 19:19:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.9.mlp.gate_up_proj.q_weight", shape: (13824, 320), dtype: uint32 98%|████████████████████████████████████████████████████████████████████████████████████████▍ | 278/283 [01:31<00:00, 6.95it/s] [2024-03-18 19:19:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.9.mlp.gate_up_proj.q_scale", shape: (13824, 64), dtype: float32 98%|████████████████████████████████████████████████████████████████████████████████████████▍ | 278/283 [01:31<00:00, 6.95it/s] 99%|████████████████████████████████████████████████████████████████████████████████████████▋ | 279/283 [01:31<00:00, 5.09it/s] [2024-03-18 19:19:58] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.9.post_attention_layernorm.weight", shape: (2560,), dtype: float32 99%|████████████████████████████████████████████████████████████████████████████████████████▋ | 279/283 [01:31<00:00, 5.09it/s] [2024-03-18 19:19:58] INFO huggingface_loader.py:172: [Not quantized] Parameter: "model.layers.9.self_attn.c_attn.bias", shape: (7680,), dtype: float32 99%|████████████████████████████████████████████████████████████████████████████████████████▋ | 279/283 [01:31<00:00, 5.09it/s] [2024-03-18 19:19:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.9.self_attn.c_attn.q_weight", shape: (7680, 320), dtype: uint32 99%|████████████████████████████████████████████████████████████████████████████████████████▋ | 279/283 [01:32<00:00, 5.09it/s] [2024-03-18 19:19:58] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.9.self_attn.c_attn.q_scale", shape: (7680, 64), dtype: float32 99%|████████████████████████████████████████████████████████████████████████████████████████▋ | 279/283 [01:32<00:00, 5.09it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████▋| 282/283 [01:32<00:00, 6.17it/s] [2024-03-18 19:19:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.9.self_attn.o_proj.q_weight", shape: (2560, 320), dtype: uint32 100%|█████████████████████████████████████████████████████████████████████████████████████████▋| 282/283 [01:32<00:00, 6.17it/s] [2024-03-18 19:19:59] INFO huggingface_loader.py:164: [Quantized] Parameter: "model.layers.9.self_attn.o_proj.q_scale", shape: (2560, 64), dtype: float32 100%|█████████████████████████████████████████████████████████████████████████████████████████▋| 282/283 [01:32<00:00, 6.17it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████| 283/283 [01:32<00:00, 3.07it/s] [2024-03-18 19:19:59] INFO huggingface_loader.py:194: Unloading HF weight file: ../dist/models/Qwen1.5-4B/model-00001-of-00002.safetensors [2024-03-18 19:19:59] INFO stats.py:76: Time usage: HF loading: 25.268 sec; Pre-quantization mapping: 9.640 sec; Quantization: 3.597 sec [2024-03-18 19:19:59] INFO stats.py:90: RAM usage: Peak RAM: 7.431 GB. Total bytes loaded from disk: 14.716 GB [2024-03-18 19:19:59] INFO convert_weight.py:156: Parameter size after quantization: 2.210 GB [2024-03-18 19:19:59] INFO convert_weight.py:161: Total parameters: 3,950,369,280 [2024-03-18 19:19:59] INFO convert_weight.py:162: Bits per parameter: 4.805 [2024-03-18 19:19:59] INFO convert_weight.py:167: Saved to directory: /tmp/tmp78htwu3y All finished, 83 total shards committed, record saved to /tmp/tmp78htwu3y/ndarray-cache.json Also saved a bf16 record to /tmp/tmp78htwu3y/ndarray-cache-b16.json