vwxyzjn commited on Mar 12

Commit

6270951

verified ·

1 Parent(s): 77db2fb

Add files using upload-large-folder tool

Browse files

Files changed (23) hide show

README.md +207 -118
config.json +28 -0
generation_config.json +7 -0
merges.txt +0 -0
metadata.json +1 -0
pytorch_model-00001-of-00014.bin +3 -0
pytorch_model-00002-of-00014.bin +3 -0
pytorch_model-00003-of-00014.bin +3 -0
pytorch_model-00004-of-00014.bin +3 -0
pytorch_model-00005-of-00014.bin +3 -0
pytorch_model-00006-of-00014.bin +3 -0
pytorch_model-00007-of-00014.bin +3 -0
pytorch_model-00008-of-00014.bin +3 -0
pytorch_model-00009-of-00014.bin +3 -0
pytorch_model-00010-of-00014.bin +3 -0
pytorch_model-00011-of-00014.bin +3 -0
pytorch_model-00012-of-00014.bin +3 -0
pytorch_model-00013-of-00014.bin +3 -0
pytorch_model-00014-of-00014.bin +3 -0
pytorch_model.bin.index.json +714 -0
special_tokens_map.json +30 -0
tokenizer_config.json +192 -0
vocab.json +0 -0

README.md CHANGED Viewed

@@ -1,126 +1,215 @@
 ---
-license: apache-2.0
-language:
-- en
-pipeline_tag: text-generation
 ---
-<img alt="OLMo Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmo2/olmo.png" width="242px">
-OLMo 2 32B Instruct March 2025 is post-trained variant of the [OLMo-2 32B March 2025](https://huggingface.co/allenai/OLMo-2-0325-32B/) model, which has undergone supervised finetuning on an OLMo-specific variant of the [Tülu 3 dataset](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmo-2-1124-7b-preference-mix), and finally RLVR training using [this data](https://huggingface.co/datasets/allenai/RLVR-GSM).
-Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
-Check out the [OLMo 2 paper](https://arxiv.org/abs/2501.00656) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
-OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
-These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs, and associated training details.
-## Model description
-- **Model type:** A model trained on a mix of publicly available, synthetic and human-created datasets.
-- **Language(s) (NLP):** Primarily English
-- **License:** Apache 2.0
-- **Finetuned from model:** allenai/OLMo-2-0325-32B
-### Model Sources
-- **Project Page:** https://allenai.org/olmo
-- **Repositories:**
-    - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo-core
-    - Evaluation code: https://github.com/allenai/olmes
-    - Further fine-tuning code: https://github.com/allenai/open-instruct
-- **Paper:** https://arxiv.org/abs/2501.00656
-- **Demo:** https://playground.allenai.org/
-## Installation
-OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using:
-```bash
-pip install --upgrade git+https://github.com/huggingface/transformers.git
-```
-## Using the model
-### Loading with HuggingFace
-To load the model with HuggingFace, use the following snippet:
-```
-from transformers import AutoModelForCausalLM
-olmo_model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-32B-SFT")
-```
-### Chat template
-The chat template for our models is formatted as:
-```
-<|endoftext|><|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
-```
-Or with new lines expanded:
-```
-<|endoftext|><|user|>
-How are you doing?
-<|assistant|>
-I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
-```
-It is embedded within the tokenizer as well, for `tokenizer.apply_chat_template`.
-### System prompt
-In Ai2 demos, we use this system prompt by default:
-```
-You are OLMo 2, a helpful and harmless AI Assistant built by the Allen Institute for AI.
-```
-The model has not been trained with a specific system prompt in mind.
-### Bias, Risks, and Limitations
-The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
-See the Falcon 180B model card for an example of this.
-## Performance
-| Model | Average | AlpacaEval | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA |
-|-------|---------|------------|-----|------|--------|---------|------|-------|---------|-------|---------|
-| **Open weights models** |
-| Gemma-2-9B-it | 51.9 | 43.7 | 2.5 | 58.8 | 79.7 | 69.9 | 29.8 | 69.1 | 75.5 | 28.3 | 61.4 |
-| Ministral-8B-Instruct | 52.1 | 31.4 | 56.2 | 56.2 | 80.0 | 56.4 | 40.0 | 68.5 | 56.2 | 20.2 | 55.5 |
-| Mistral-Nemo-Instruct-2407 | 50.9 | 45.8 | 54.6 | 23.6 | 81.4 | 64.5 | 31.9 | 70.0 | 52.7 | 26.9 | 57.7 |
-| Qwen-2.5-7B-Instruct | 57.1 | 29.7 | 25.3 | 54.4 | 83.8 | 74.7 | 69.9 | 76.6 | 75.0 | 18.1 | 63.1 |
-| Llama-3.1-8B-Instruct | 58.9 | 25.8 | 69.7 | 61.7 | 83.4 | 80.6 | 42.5 | 71.3 | 70.2 | 28.4 | 55.1 |
-| Tülu 3 8B | 60.4 | 34.0 | 66.0 | 62.6 | 87.6 | 82.4 | 43.7 | 68.2 | 75.4 | 29.1 | 55.0 |
-| Qwen-2.5-14B-Instruct | 60.8 | 34.6 | 34.0 | 50.5 | 83.9 | 82.4 | 70.6 | 81.1 | 79.3 | 21.1 | 70.8 |
-| **Fully open models** |
-| OLMo-7B-Instruct | 28.2 | 5.2 | 35.3 | 30.7 | 14.3 | 32.2 | 2.1 | 46.3 | 54.0 | 17.1 | 44.5 |
-| OLMo-7B-0424-Instruct | 33.1 | 8.5 | 34.4 | 47.9 | 23.2 | 39.2 | 5.2 | 48.9 | 49.3 | 18.9 | 55.2 |
-| OLMoE-1B-7B-0924-Instruct | 35.5 | 8.5 | 37.2 | 34.3 | 47.2 | 46.2 | 8.4 | 51.6 | 51.6 | 20.6 | 49.1 |
-| MAP-Neo-7B-Instruct | 42.9 | 17.6 | 26.4 | 48.2 | 69.4 | 35.9 | 31.5 | 56.5 | 73.7 | 18.4 | 51.6 |
-| *OLMo-2-7B-SFT* | 50.2 | 10.2 | 49.7 | 59.6 | 74.6 | 66.9 | 25.3 | 61.1 | 82.1 | 23.6 | 48.6 |
-| *OLMo-2-7B-DPO* | 54.2 | 27.9 | 46.7 | 60.2 | 82.6 | 73.0 | 30.3 | 60.8 | 81.0 | 23.5 | 56.0 |
-| *OLMo-2-13B-SFT* | 55.3 | 11.5 | 59.6 | 71.3 | 76.3 | 68.6 | 29.5 | 68.0 | 82.3 | 29.4 | 57.1 |
-| *OLMo-2-13B-DPO* | 60.6 | 38.3 | 57.9 | 71.5 | 82.3 | 80.2 | 35.2 | 67.9 | 79.7 | 29.0 | 63.9 |
-| **OLMo-2-7B-1124–Instruct** | 54.8 | 29.1 | 46.6 | 60.5 | 85.1 | 72.3 | 32.5 | 61.3 | 80.6 | 23.2 | 56.5 |
-| **OLMo-2-13B-1124-Instruct** | 62.0 | 39.5 | 58.8 | 71.5 | 87.4 | 82.6 | 39.2 | 68.5 | 79.1 | 28.8 | 64.3 |
-## License and use
-OLMo 2 is licensed under the Apache 2.0 license.
-OLMo 2 is intended for research and educational use.
-For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).
-This model has been fine-tuned using a dataset mix with outputs generated from third party models and are subject to additional terms: [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
-## Citation
-```bibtex
-@article{olmo20242olmo2furious,
-      title={2 OLMo 2 Furious},
-      author={Team OLMo and Pete Walsh and Luca Soldaini and Dirk Groeneveld and Kyle Lo and Shane Arora and Akshita Bhagia and Yuling Gu and Shengyi Huang and Matt Jordan and Nathan Lambert and Dustin Schwenk and Oyvind Tafjord and Taira Anderson and David Atkinson and Faeze Brahman and Christopher Clark and Pradeep Dasigi and Nouha Dziri and Michal Guerquin and Hamish Ivison and Pang Wei Koh and Jiacheng Liu and Saumya Malik and William Merrill and Lester James V. Miranda and Jacob Morrison and Tyler Murray and Crystal Nam and Valentina Pyatkin and Aman Rangapur and Michael Schmitz and Sam Skjonsberg and David Wadden and Christopher Wilhelm and Michael Wilson and Luke Zettlemoyer and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
-      year={2024},
-      eprint={2501.00656},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2501.00656},
-}
-```

 ---
+language: en
+model-index:
+- name: allenai/open_instruct_dev
+  results:
+  - task:
+      type: preference_evaluation
+    dataset:
+      name: reward-bench
+      type: allenai/reward-bench
+    metrics:
+    - type: accuracy
+      value: 1.0
+    - type: accuracy
+      value: 1.0
+    - type: accuracy
+      value: 1.0
+    - type: accuracy
+      value: 1.0
 ---
+# Model Card for allenai/open_instruct_dev
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** en
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "_name_or_path": "allenai/OLMo-2-32B-6T-3x1-1x3",
+  "architectures": [
+    "Olmo2ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 100257,
+  "eos_token_id": 100257,
+  "hidden_act": "silu",
+  "hidden_size": 5120,
+  "initializer_range": 0.02,
+  "intermediate_size": 27648,
+  "max_position_embeddings": 4096,
+  "model_type": "olmo2",
+  "num_attention_heads": 40,
+  "num_hidden_layers": 64,
+  "num_key_value_heads": 8,
+  "pad_token_id": 100277,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 500000,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.47.1",
+  "use_cache": true,
+  "vocab_size": 100352
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 100257,
+  "eos_token_id": 100257,
+  "pad_token_id": 100277,
+  "transformers_version": "4.47.1"
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

metadata.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"model_name": "0222_32B_finetune_6Tsoupmany_epoch_2_lr_4e-6_loss_sum", "model_type": "sft", "datasets": ["allenai/tulu-3-sft-olmo-2-mixture-0225"], "base_model": "allenai/OLMo-2-32B-6T-3x1-1x3", "wandb_path": "https://wandb.ai/ai2-llm/open_instruct_internal/runs/j8ebby0o", "beaker_experiment": "https://beaker.org/ex/01JMSSTQB8CZRBS8ZDEPARYQKG/", "beaker_datasets": ["https://beaker.org/ds/01JMSSTQCYKGWRTX3TAT4P0280", "https://beaker.org/ds/01JMSSTQJ48YMKHV51R1XH8H0C", "https://beaker.org/ds/01JMSSTQRCMX0M9F8FHN7K8MV2", "https://beaker.org/ds/01JMSSTQXJWHVP7BRBZJZPSCJ6", "https://beaker.org/ds/01JMSSTR33MT3EZSNBYHH6MJB1", "https://beaker.org/ds/01JMSSTR8GCK682AHKZVGK8NN0", "https://beaker.org/ds/01JMSSTRDJMSSF3FCAXZSDTG8D", "https://beaker.org/ds/01JMSSTRJPX7SE0PYEQZ3MX8BX"]}

pytorch_model-00001-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e767086d91fae4c4e32f46f3947b2c3abc15b20f004af5cd0468ef84a9791e77
+size 4991369072

pytorch_model-00002-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b4cafa15319d1efbb7684a204ced452d2d737f8025ab99b1e46db8f8feb8369c
+size 4938989440

pytorch_model-00003-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d323e3704621918d37c08bf23355f3ece15203b29810d7cd0c0e219e484e79c
+size 4876061328

pytorch_model-00004-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0051e7b0a8eb8c174e469f567c25d79d5c29c450fdddcd7bc8da7f8358779aac
+size 4876061328

pytorch_model-00005-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a0b130e5232cd7395296a56f93e3534835a7860931d0e8668b2a7a94ae7f7b7d
+size 4876061328

pytorch_model-00006-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5fa7431b91217540f3f9c0ce40fdc395fdf9604d8b41232e44c30cf9aea451e5
+size 4876061328

pytorch_model-00007-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:27f4a0ed790c9b96a464f3a4c2a8fa861ce892fb5029a1a22dd0dd8288a8a872
+size 4876061328

pytorch_model-00008-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e3d13b5204742a720bc456c6c96e7a26021b25ebb0c2678562cdda685df0263
+size 4876061328

pytorch_model-00009-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4b52e28eacd954651640134f54127ba2436e107fae1eb3f8d18fd91c31aca63a
+size 4876061328

pytorch_model-00010-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f63d250d07c8f2a0cfe258ad461e2ca50659fbaeb9bf02bfeb046b7d737e94eb
+size 4876061328

pytorch_model-00011-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0d784139413567f023ee6cdd92010a69487060efb32a6ddc98bc8d1c82d0b6bd
+size 4876061328

pytorch_model-00012-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e34125c9ddaacdadbfc1e9e86ddaa195965eeabdf2ecf0193ed6e887a16dedc5
+size 4876061328

pytorch_model-00013-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:54b79d023ab216c6072f3dc407d44bb11510c44c1b43be7f3e78a46725100576
+size 4750228410

pytorch_model-00014-of-00014.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35b74222904eb79c1dd5b5beca8fca243da8f4f27a0126477aa241e4d3d5d17a
+size 1027605893

pytorch_model.bin.index.json ADDED Viewed

	@@ -0,0 +1,714 @@

+{
+  "metadata": {
+    "total_size": 64468559872
+  },
+  "weight_map": {
+    "lm_head.weight": "pytorch_model-00014-of-00014.bin",
+    "model.embed_tokens.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.0.post_feedforward_layernorm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.0.self_attn.k_norm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.0.self_attn.q_norm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.1.post_feedforward_layernorm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.1.self_attn.k_norm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.1.self_attn.q_norm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.10.mlp.down_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.10.mlp.up_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.10.post_feedforward_layernorm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.10.self_attn.k_norm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.10.self_attn.q_norm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.11.mlp.down_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.11.mlp.up_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.11.post_feedforward_layernorm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.11.self_attn.k_norm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.11.self_attn.q_norm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.12.mlp.down_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.12.mlp.up_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.12.post_feedforward_layernorm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.12.self_attn.k_norm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.12.self_attn.q_norm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.13.mlp.down_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.13.mlp.up_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.13.post_feedforward_layernorm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.13.self_attn.k_norm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.13.self_attn.q_norm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.14.mlp.down_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.14.mlp.up_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.14.post_feedforward_layernorm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.14.self_attn.k_norm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.14.self_attn.q_norm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.15.mlp.down_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.15.mlp.up_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.15.post_feedforward_layernorm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.15.self_attn.k_norm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.15.self_attn.q_norm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.16.mlp.down_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.16.mlp.up_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.16.post_feedforward_layernorm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.16.self_attn.k_norm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.16.self_attn.q_norm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.17.mlp.down_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.17.mlp.up_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.17.post_feedforward_layernorm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.17.self_attn.k_norm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.17.self_attn.q_norm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.18.mlp.down_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.18.mlp.up_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.18.post_feedforward_layernorm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.18.self_attn.k_norm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.18.self_attn.q_norm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.19.mlp.down_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.19.mlp.up_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.19.post_feedforward_layernorm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.19.self_attn.k_norm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.19.self_attn.q_norm.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00004-of-00014.bin",
+    "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.2.post_feedforward_layernorm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.2.self_attn.k_norm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.2.self_attn.q_norm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.20.mlp.down_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.20.mlp.up_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.20.post_feedforward_layernorm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.20.self_attn.k_norm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.20.self_attn.q_norm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.21.mlp.down_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.21.mlp.up_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.21.post_feedforward_layernorm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.21.self_attn.k_norm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.21.self_attn.q_norm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.22.mlp.down_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.22.mlp.up_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.22.post_feedforward_layernorm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.22.self_attn.k_norm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.22.self_attn.q_norm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.23.mlp.down_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.23.mlp.up_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.23.post_feedforward_layernorm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.23.self_attn.k_norm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.23.self_attn.q_norm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.24.mlp.down_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.24.mlp.up_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.24.post_feedforward_layernorm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.24.self_attn.k_norm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.24.self_attn.q_norm.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00005-of-00014.bin",
+    "model.layers.25.mlp.down_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.25.mlp.up_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.25.post_feedforward_layernorm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.25.self_attn.k_norm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.25.self_attn.q_norm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.26.mlp.down_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.26.mlp.up_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.26.post_feedforward_layernorm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.26.self_attn.k_norm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.26.self_attn.q_norm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.27.mlp.down_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.27.mlp.up_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.27.post_feedforward_layernorm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.27.self_attn.k_norm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.27.self_attn.q_norm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.28.mlp.down_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.28.mlp.up_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.28.post_feedforward_layernorm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.28.self_attn.k_norm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.28.self_attn.q_norm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.29.mlp.down_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.29.mlp.up_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.29.post_feedforward_layernorm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.29.self_attn.k_norm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.29.self_attn.q_norm.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00006-of-00014.bin",
+    "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.3.post_feedforward_layernorm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.3.self_attn.k_norm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.3.self_attn.q_norm.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.30.mlp.down_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.30.mlp.up_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.30.post_feedforward_layernorm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.30.self_attn.k_norm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.30.self_attn.q_norm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.31.mlp.down_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.31.mlp.up_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.31.post_feedforward_layernorm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.31.self_attn.k_norm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.31.self_attn.q_norm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.32.mlp.down_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.32.mlp.up_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.32.post_feedforward_layernorm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.32.self_attn.k_norm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.32.self_attn.q_norm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.33.mlp.down_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.33.mlp.up_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.33.post_feedforward_layernorm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.33.self_attn.k_norm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.33.self_attn.q_norm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.34.mlp.down_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.34.mlp.up_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.34.post_feedforward_layernorm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.34.self_attn.k_norm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.34.self_attn.q_norm.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00007-of-00014.bin",
+    "model.layers.35.mlp.down_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.35.mlp.up_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.35.post_feedforward_layernorm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.35.self_attn.k_norm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.35.self_attn.q_norm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.36.mlp.down_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.36.mlp.up_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.36.post_feedforward_layernorm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.36.self_attn.k_norm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.36.self_attn.q_norm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.37.mlp.down_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.37.mlp.up_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.37.post_feedforward_layernorm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.37.self_attn.k_norm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.37.self_attn.q_norm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.38.mlp.down_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.38.mlp.up_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.38.post_feedforward_layernorm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.38.self_attn.k_norm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.38.self_attn.q_norm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.39.mlp.down_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.39.mlp.up_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.39.post_feedforward_layernorm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.39.self_attn.k_norm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.39.self_attn.q_norm.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00008-of-00014.bin",
+    "model.layers.4.mlp.down_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.4.mlp.up_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.4.post_feedforward_layernorm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.4.self_attn.k_norm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.4.self_attn.q_norm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00014.bin",
+    "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.40.mlp.down_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.40.mlp.gate_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.40.mlp.up_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.40.post_attention_layernorm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.40.post_feedforward_layernorm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.40.self_attn.k_norm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.40.self_attn.k_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.40.self_attn.o_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.40.self_attn.q_norm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.40.self_attn.q_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.40.self_attn.v_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.41.mlp.down_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.41.mlp.gate_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.41.mlp.up_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.41.post_attention_layernorm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.41.post_feedforward_layernorm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.41.self_attn.k_norm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.41.self_attn.k_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.41.self_attn.o_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.41.self_attn.q_norm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.41.self_attn.q_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.41.self_attn.v_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.42.mlp.down_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.42.mlp.gate_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.42.mlp.up_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.42.post_attention_layernorm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.42.post_feedforward_layernorm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.42.self_attn.k_norm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.42.self_attn.k_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.42.self_attn.o_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.42.self_attn.q_norm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.42.self_attn.q_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.42.self_attn.v_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.43.mlp.down_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.43.mlp.gate_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.43.mlp.up_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.43.post_attention_layernorm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.43.post_feedforward_layernorm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.43.self_attn.k_norm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.43.self_attn.k_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.43.self_attn.o_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.43.self_attn.q_norm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.43.self_attn.q_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.43.self_attn.v_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.44.mlp.down_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.44.mlp.gate_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.44.mlp.up_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.44.post_attention_layernorm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.44.post_feedforward_layernorm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.44.self_attn.k_norm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.44.self_attn.k_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.44.self_attn.o_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.44.self_attn.q_norm.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.44.self_attn.q_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.44.self_attn.v_proj.weight": "pytorch_model-00009-of-00014.bin",
+    "model.layers.45.mlp.down_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.45.mlp.gate_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.45.mlp.up_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.45.post_attention_layernorm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.45.post_feedforward_layernorm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.45.self_attn.k_norm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.45.self_attn.k_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.45.self_attn.o_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.45.self_attn.q_norm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.45.self_attn.q_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.45.self_attn.v_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.46.mlp.down_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.46.mlp.gate_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.46.mlp.up_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.46.post_attention_layernorm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.46.post_feedforward_layernorm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.46.self_attn.k_norm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.46.self_attn.k_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.46.self_attn.o_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.46.self_attn.q_norm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.46.self_attn.q_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.46.self_attn.v_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.47.mlp.down_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.47.mlp.gate_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.47.mlp.up_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.47.post_attention_layernorm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.47.post_feedforward_layernorm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.47.self_attn.k_norm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.47.self_attn.k_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.47.self_attn.o_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.47.self_attn.q_norm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.47.self_attn.q_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.47.self_attn.v_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.48.mlp.down_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.48.mlp.gate_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.48.mlp.up_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.48.post_attention_layernorm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.48.post_feedforward_layernorm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.48.self_attn.k_norm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.48.self_attn.k_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.48.self_attn.o_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.48.self_attn.q_norm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.48.self_attn.q_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.48.self_attn.v_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.49.mlp.down_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.49.mlp.gate_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.49.mlp.up_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.49.post_attention_layernorm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.49.post_feedforward_layernorm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.49.self_attn.k_norm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.49.self_attn.k_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.49.self_attn.o_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.49.self_attn.q_norm.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.49.self_attn.q_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.49.self_attn.v_proj.weight": "pytorch_model-00010-of-00014.bin",
+    "model.layers.5.mlp.down_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.5.mlp.up_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.5.post_feedforward_layernorm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.5.self_attn.k_norm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.5.self_attn.q_norm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.50.mlp.down_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.50.mlp.gate_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.50.mlp.up_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.50.post_attention_layernorm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.50.post_feedforward_layernorm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.50.self_attn.k_norm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.50.self_attn.k_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.50.self_attn.o_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.50.self_attn.q_norm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.50.self_attn.q_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.50.self_attn.v_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.51.mlp.down_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.51.mlp.gate_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.51.mlp.up_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.51.post_attention_layernorm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.51.post_feedforward_layernorm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.51.self_attn.k_norm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.51.self_attn.k_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.51.self_attn.o_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.51.self_attn.q_norm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.51.self_attn.q_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.51.self_attn.v_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.52.mlp.down_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.52.mlp.gate_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.52.mlp.up_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.52.post_attention_layernorm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.52.post_feedforward_layernorm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.52.self_attn.k_norm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.52.self_attn.k_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.52.self_attn.o_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.52.self_attn.q_norm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.52.self_attn.q_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.52.self_attn.v_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.53.mlp.down_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.53.mlp.gate_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.53.mlp.up_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.53.post_attention_layernorm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.53.post_feedforward_layernorm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.53.self_attn.k_norm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.53.self_attn.k_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.53.self_attn.o_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.53.self_attn.q_norm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.53.self_attn.q_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.53.self_attn.v_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.54.mlp.down_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.54.mlp.gate_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.54.mlp.up_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.54.post_attention_layernorm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.54.post_feedforward_layernorm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.54.self_attn.k_norm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.54.self_attn.k_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.54.self_attn.o_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.54.self_attn.q_norm.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.54.self_attn.q_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.54.self_attn.v_proj.weight": "pytorch_model-00011-of-00014.bin",
+    "model.layers.55.mlp.down_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.55.mlp.gate_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.55.mlp.up_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.55.post_attention_layernorm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.55.post_feedforward_layernorm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.55.self_attn.k_norm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.55.self_attn.k_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.55.self_attn.o_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.55.self_attn.q_norm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.55.self_attn.q_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.55.self_attn.v_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.56.mlp.down_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.56.mlp.gate_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.56.mlp.up_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.56.post_attention_layernorm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.56.post_feedforward_layernorm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.56.self_attn.k_norm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.56.self_attn.k_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.56.self_attn.o_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.56.self_attn.q_norm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.56.self_attn.q_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.56.self_attn.v_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.57.mlp.down_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.57.mlp.gate_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.57.mlp.up_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.57.post_attention_layernorm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.57.post_feedforward_layernorm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.57.self_attn.k_norm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.57.self_attn.k_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.57.self_attn.o_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.57.self_attn.q_norm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.57.self_attn.q_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.57.self_attn.v_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.58.mlp.down_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.58.mlp.gate_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.58.mlp.up_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.58.post_attention_layernorm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.58.post_feedforward_layernorm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.58.self_attn.k_norm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.58.self_attn.k_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.58.self_attn.o_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.58.self_attn.q_norm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.58.self_attn.q_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.58.self_attn.v_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.59.mlp.down_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.59.mlp.gate_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.59.mlp.up_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.59.post_attention_layernorm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.59.post_feedforward_layernorm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.59.self_attn.k_norm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.59.self_attn.k_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.59.self_attn.o_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.59.self_attn.q_norm.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.59.self_attn.q_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.59.self_attn.v_proj.weight": "pytorch_model-00012-of-00014.bin",
+    "model.layers.6.mlp.down_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.6.mlp.up_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.6.post_feedforward_layernorm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.6.self_attn.k_norm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.6.self_attn.q_norm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.60.mlp.down_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.60.mlp.gate_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.60.mlp.up_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.60.post_attention_layernorm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.60.post_feedforward_layernorm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.60.self_attn.k_norm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.60.self_attn.k_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.60.self_attn.o_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.60.self_attn.q_norm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.60.self_attn.q_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.60.self_attn.v_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.61.mlp.down_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.61.mlp.gate_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.61.mlp.up_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.61.post_attention_layernorm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.61.post_feedforward_layernorm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.61.self_attn.k_norm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.61.self_attn.k_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.61.self_attn.o_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.61.self_attn.q_norm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.61.self_attn.q_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.61.self_attn.v_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.62.mlp.down_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.62.mlp.gate_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.62.mlp.up_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.62.post_attention_layernorm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.62.post_feedforward_layernorm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.62.self_attn.k_norm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.62.self_attn.k_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.62.self_attn.o_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.62.self_attn.q_norm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.62.self_attn.q_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.62.self_attn.v_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.63.mlp.down_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.63.mlp.gate_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.63.mlp.up_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.63.post_attention_layernorm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.63.post_feedforward_layernorm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.63.self_attn.k_norm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.63.self_attn.k_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.63.self_attn.o_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.63.self_attn.q_norm.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.63.self_attn.q_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.63.self_attn.v_proj.weight": "pytorch_model-00013-of-00014.bin",
+    "model.layers.7.mlp.down_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.7.mlp.up_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.7.post_feedforward_layernorm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.7.self_attn.k_norm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.7.self_attn.q_norm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.8.mlp.down_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.8.post_feedforward_layernorm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.8.self_attn.k_norm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.8.self_attn.q_norm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.9.mlp.down_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.9.mlp.up_proj.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.9.post_feedforward_layernorm.weight": "pytorch_model-00003-of-00014.bin",
+    "model.layers.9.self_attn.k_norm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.9.self_attn.q_norm.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00014.bin",
+    "model.norm.weight": "pytorch_model-00013-of-00014.bin"
+  }
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|pad|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,192 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "100256": {
+      "content": "<|extra_id_0|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100257": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100258": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100259": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100260": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100261": {
+      "content": "|||PHONE_NUMBER|||",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100262": {
+      "content": "|||EMAIL_ADDRESS|||",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100263": {
+      "content": "|||IP_ADDRESS|||",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100264": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100265": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100266": {
+      "content": "<|extra_id_1|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100267": {
+      "content": "<|extra_id_2|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100268": {
+      "content": "<|extra_id_3|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100269": {
+      "content": "<|extra_id_4|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100270": {
+      "content": "<|extra_id_5|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100271": {
+      "content": "<|extra_id_6|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100272": {
+      "content": "<|extra_id_7|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100273": {
+      "content": "<|extra_id_8|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100274": {
+      "content": "<|extra_id_9|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100275": {
+      "content": "<|extra_id_10|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "100276": {
+      "content": "<|endofprompt|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100277": {
+      "content": "<|pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "chat_template": "{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|system|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'user' %}{{ '<|user|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'assistant' %}{% if not loop.last %}{{ '<|assistant|>\n'  + message['content'] + eos_token + '\n' }}{% else %}{{ '<|assistant|>\n'  + message['content'] + eos_token }}{% endif %}{% endif %}{% if loop.last and add_generation_prompt %}{{ '<|assistant|>\n' }}{% endif %}{% endfor %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<|pad|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff