lazarevich commited on Mar 26

Commit

d153e2b

0 Parent(s):

release model

Browse files

Files changed (18) hide show

LICENSE +114 -0
NOTICE +3 -0
README.md +162 -0
cb_tokenizer.py +61 -0
config.json +54 -0
configuration_cbllama.py +42 -0
generation_config.json +12 -0
helmet_result.png +0 -0
lm_infinite_attention.py +244 -0
model-00001-of-00004.safetensors +3 -0
model-00002-of-00004.safetensors +3 -0
model-00003-of-00004.safetensors +3 -0
model-00004-of-00004.safetensors +3 -0
model.safetensors.index.json +298 -0
modeling_cbllama.py +183 -0
special_tokens_map.json +18 -0
tokenizer.json +0 -0
tokenizer_config.json +2069 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,114 @@

+LLAMA 3.1 COMMUNITY LICENSE AGREEMENT
+Llama 3.1 Version Release Date: July 23, 2024
+“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the
+Llama Materials set forth herein.
+“Documentation” means the specifications, manuals and documentation accompanying Llama 3.1
+distributed by Meta at https://llama.meta.com/doc/overview.
+“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into
+this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or
+regulations to provide legal consent and that has legal authority to bind your employer or such other
+person or entity if you are entering in this Agreement on their behalf.
+“Llama 3.1” means the foundational large language models and software and algorithms, including
+machine-learning model code, trained model weights, inference-enabling code, training-enabling code,
+fine-tuning enabling code and other elements of the foregoing distributed by Meta at
+https://llama.meta.com/llama-downloads.
+“Llama Materials” means, collectively, Meta’s proprietary Llama 3.1 and Documentation (and any
+portion thereof) made available under this Agreement.
+“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your
+principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located
+outside of the EEA or Switzerland).
+By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials,
+you agree to be bound by this Agreement.
+1. License Rights and Redistribution.
+  a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free
+limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama
+Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the
+Llama Materials.
+  b. Redistribution and Use.
+      i. If you distribute or make available the Llama Materials (or any derivative works
+thereof), or a product or service (including another AI model) that contains any of them, you shall (A)
+provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with
+Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use
+the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or
+otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at
+the beginning of any such AI model name.
+      ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part
+of an integrated end user product, then Section 2 of this Agreement will not apply to you.
+      iii. You must retain in all copies of the Llama Materials that you distribute the following
+attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.1 is
+licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights
+Reserved.”
+      iv. Your use of the Llama Materials must comply with applicable laws and regulations
+(including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama
+Materials (available at https://llama.meta.com/llama3_1/use-policy), which is hereby incorporated by
+reference into this Agreement.
+2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users
+of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700
+million monthly active users in the preceding calendar month, you must request a license from Meta,
+which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the
+rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
+3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY
+OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF
+ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED,
+INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT,
+MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR
+DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND
+ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND
+RESULTS.
+4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING
+OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL,
+INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED
+OF THE POSSIBILITY OF ANY OF THE FOREGOING.
+5. Intellectual Property.
+  a. No trademark licenses are granted under this Agreement, and in connection with the Llama
+Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other
+or any of its affiliates, except as required for reasonable and customary use in describing and
+redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to
+use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will
+comply with Meta’s brand guidelines (currently accessible at
+https://about.meta.com/brand/resources/meta/company-brand/ ). All goodwill arising out of your use
+of the Mark will inure to the benefit of Meta.
+  b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with
+respect to any derivative works and modifications of the Llama Materials that are made by you, as
+between you and Meta, you are and will be the owner of such derivative works and modifications.
+  c. If you institute litigation or other proceedings against Meta or any entity (including a
+cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs or
+results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other
+rights owned or licensable by you, then any licenses granted to you under this Agreement shall
+terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold
+harmless Meta from and against any claim by any third party arising out of or related to your use or
+distribution of the Llama Materials.
+6. Term and Termination. The term of this Agreement will commence upon your acceptance of this
+Agreement or access to the Llama Materials and will continue in full force and effect until terminated in
+accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in
+breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete
+and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this
+Agreement.
+7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of
+the State of California without regard to choice of law principles, and the UN Convention on Contracts
+for the International Sale of Goods does not apply to this Agreement. The courts of California shall have
+exclusive jurisdiction of any dispute arising out of this Agreement.

NOTICE ADDED Viewed

	@@ -0,0 +1,3 @@

+Llama 3.1 is licensed under the Llama 3.1 Community License,
+Copyright © Meta Platforms, Inc.
+All Rights Reserved.

README.md ADDED Viewed

	@@ -0,0 +1,162 @@

+---
+license: other
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- cerebras
+- sparse-attention
+- llama-3
+- pytorch
+---
+# Llama-3-CBHybridL-8B: Model Information
+We are excited to release the Cerebras hybrid dense/sparse attention versions of Llama-3.1-8B-Instruct models optimized for long-context performance. This series includes two models: Llama3.1-CBHybridL-8B (model with 25 sparse attention layers out of 32) and Llama3.1-CBHybridM-8B (28 sparse attention layers out of 32).
+This model – Cerebras Llama3.1-CBHybridL-8B – was built on top of Llama-3.1-8B-Instruct using sparse attention training features available in Cerebras Model Zoo Release 2.4. We created hybrid versions of Llama-3.1-8B-Instruct with most of the self-attention layers fine-tuned to perform sparse lambda-mask attention which reduces KV cache memory usage by 1.6-1.7x while largely maintaining long-context performance.
+You can find more information about Cerebras hybrid Llama models at the following locations:
+* [Blog post](https://www.cerebras.ai/blog/compressing-kv-cache-memory-by-half-with-sparse-attention)
+* [Llama-3-CBHybridL-8B model on HuggingFace](https://huggingface.co/cerebras/Llama-3-CBHybridL-8B)
+* [Llama-3-CBHybridM-8B model on HuggingFace](https://huggingface.co/cerebras/Llama-3-CBHybridM-8B)
+## Results
+Our hybrid models retain most of their performance in long-context despite requiring much less memory for KV cache:
+![HELMET result](./helmet_result.png)
+|    LongBench suite    | Llama-3.1-8B-Instruct | Llama-3-CBHybridM-8B | Llama-3-CBHybridL-8B |
+|-----------------------|-----------------------|----------------------|----------------------|
+| KV cache memory*, GB  | 2.147                 | 1.275                | 1.376                |
+| Single-doc QA         | 54.197                | 54.507               | 56.187               |
+| Multi-doc QA          | 41.455                | 41.022               | 43.082               |
+| Summarization         | 26.1275               | 25.607               | 25.357               |
+| Few-shot learning     | 63.4075               | 64.42                | 65.183               |
+| Synthetic             | 97.29                 | 96.75                | 98.0                 |
+| Code completion       | 59.745                | 66.865               | 66.49                |
+| Macro-mean (EN & ZH)  | 57.037                | 58.195               | 59.05                |
+| Macro-mean (EN)       | 58.606                | 60.485               | 60.937               |
+| HELMET suite (seq. len. 16K)  | Llama-3.1-8B-Instruct | Llama-3-CBHybridM-8B | Llama-3-CBHybridL-8B |
+|-----------------------|----------------------|----------------------|----------------------|
+| KV cache memory, GB  | 2.147                | 1.275                | 1.376                |
+| Recall               | 99.6875              | 87.5625              | 95.1875              |
+| Rerank               | 52.6671              | 42.7879              | 45.5175              |
+| RAG                  | 69.0417              | 68.625               | 69.4583              |
+| LongdocQA            | 32.061               | 34.419               | 35.2879              |
+| ICL                  | 76                   | 81.6                 | 82.2                 |
+| Summarization        | 26.278               | 22.4353              | 23.7324              |
+| Macro-mean           | 59.2892              | 56.2382              | 58.564               |
+\* we include KV cache memory usage numbers at a representative sequence length of 16K, however note that samples across LongBench tasks have variable length, with ~14.5K being the 75th percentile of the sample length distribution.
+## Example Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_id = "cerebras/Llama-3-CBHybridL-8B"
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True
+)
+messages = [
+    {"role": "system", "content": "You are a wafer-scale chatbot who always responds in wafer speak!"},
+    {"role": "user", "content": "Who are you?"},
+]
+input_ids = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to(model.device)
+outputs = model.generate(
+    input_ids,
+    max_new_tokens=256,
+)
+response = outputs[0][input_ids.shape[-1]:]
+print(tokenizer.decode(response, skip_special_tokens=True))
+```
+### Adding memory tokens for enhanced long-context performance
+We found that adding auxiliary memory tokens to input sequences at regular intervals improves long-context performance. These tokens can be inserted into the input sequence using a helper `tokenizer.insert_memory_tokens()` method as shown below:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_id = "cerebras/Llama-3-CBHybridL-8B"
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True
+)
+messages = [
+    {"role": "system", "content": "You are a wafer-scale chatbot who always responds in wafer speak!"},
+    {"role": "user", "content": "Who are you?"},
+]
+input_ids = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to(model.device)
+# Inserting 8 memory tokens per 256 tokens of original input:
+input_ids = tokenizer.insert_memory_tokens(
+    input_ids,
+    episode_length=256,
+    num_memory_tokens_per_episode=8
+)
+outputs = model.generate(
+    input_ids,
+    max_new_tokens=256,
+)
+response = outputs[0][input_ids.shape[-1]:]
+print(tokenizer.decode(response, skip_special_tokens=True))
+```
+In our ablations, inserting 8 memory tokens after every 256 tokens of original input resulted in best accuracy. See out [blog post](https://www.cerebras.ai/blog/compressing-kv-cache-memory-by-half-with-sparse-attention) for mode details.
+## License
+Built with Llama3. Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
+[Llama3.1 Community License](https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/LICENSE)
+[Acceptable Use Policy](https://www.llama.com/llama3_1/use-policy/)
+## Acknowledgements
+Our models are fine-tuned versions of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct). The sparse attention mechanism used in the `Llama-3-CBHybrid` model series is from the [LM-Infinite](https://github.com/Glaciohound/LM-Infinite) work of Han et al. See our [blog post](https://www.cerebras.ai/blog/compressing-kv-cache-memory-by-half-with-sparse-attention) for the full list of references.
+## Citing this work
+```bibtex
+@misc{cerebras2025cb-hybrid-llama,
+  author       = {Lazarevich, Ivan and Hassanpour, Mohammad and Venkatesh, Ganesh},
+  title        = {Compressing KV cache memory by half with sparse attention},
+  month        = {March},
+  year         = {2025},
+  howpublished = {\url{https://www.cerebras.ai/blog/compressing-kv-cache-memory-by-half-with-sparse-attention}}
+}
+```

cb_tokenizer.py ADDED Viewed

	@@ -0,0 +1,61 @@

+import torch
+from transformers import PreTrainedTokenizerFast
+class CBPreTrainedTokenizerFast(PreTrainedTokenizerFast):
+    def insert_memory_tokens(
+        self, input_ids, episode_length=256, num_memory_tokens_per_episode=8
+    ):
+        """
+        Inserts memory tokens into the input sequence at regular intervals.
+        This function divides the input sequence into chunks of `episode_length`,
+        and inserts `num_memory_tokens_per_episode` memory tokens between each chunk.
+        The memory tokens consist of multiple `<|memory|>` tokens followed by a
+        `<|memory_end|>` token.
+        Args:
+            input_ids (torch.Tensor):
+                A tensor of shape `(batch_size, seq_len)` containing tokenized input sequences.
+            episode_length (int, optional):
+                The maximum length of each episode before inserting memory tokens. Default is `256`.
+            num_memory_tokens_per_episode (int, optional):
+                The number of memory tokens to insert between episodes. Default is `8`.
+        Returns:
+            torch.Tensor:
+                A tensor of shape `(batch_size, new_seq_len)`, where `new_seq_len`
+                includes the inserted memory tokens.
+        """
+        memory_id = self.added_tokens_encoder["<|memory|>"]
+        memory_end_id = self.added_tokens_encoder["<|memory_end|>"]
+        batch_size, seq_len = input_ids.shape
+        device = input_ids.device
+        memory_episode_ids = torch.tensor(
+            [memory_id] * (num_memory_tokens_per_episode - 1) + [memory_end_id],
+            dtype=input_ids.dtype,
+            device=input_ids.device,
+        )
+        output_chunks = []
+        i = 0
+        while i < seq_len:
+            # Extract the current chunk
+            chunk = input_ids[
+                :, i : i + episode_length
+            ]  # Shape: (batch_size, current_chunk_len)
+            output_chunks.append(chunk)
+            i += episode_length
+            # Append memory_ids if there are more chunks to process
+            if i < seq_len:
+                # Expand memory_ids to match batch_size
+                memory_ids_batch = memory_episode_ids.unsqueeze(0).expand(
+                    batch_size, -1
+                )  # Shape: (batch_size, think_len)
+                output_chunks.append(memory_ids_batch)
+        # Concatenate all chunks along the sequence dimension
+        new_input_ids = torch.cat(output_chunks, dim=1)
+        return new_input_ids.to(device)

config.json ADDED Viewed

	@@ -0,0 +1,54 @@

+{
+  "architectures": [
+    "CBHybridLlamaForCausalLM"
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_cbllama.CBLlamaConfig",
+    "AutoModelForCausalLM": "modeling_cbllama.CBHybridLlamaForCausalLM"
+  },
+  "sliding_window_size": 8192,
+  "num_sink_tokens": 512,
+  "dense_attn_layer_indices": [
+    2,
+    5,
+    10,
+    13,
+    16,
+    17,
+    18
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 128000,
+  "eos_token_id": [
+    128001,
+    128008,
+    128009
+  ],
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 33792,
+  "mlp_bias": false,
+  "model_type": "cbllama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": {
+    "factor": 8.0,
+    "high_freq_factor": 4.0,
+    "low_freq_factor": 1.0,
+    "original_max_position_embeddings": 8192,
+    "rope_type": "llama3"
+  },
+  "rope_theta": 500000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.45.2",
+  "use_cache": true,
+  "vocab_size": 128256
+}

configuration_cbllama.py ADDED Viewed

	@@ -0,0 +1,42 @@

+from transformers.models.llama.configuration_llama import LlamaConfig
+class CBLlamaConfig(LlamaConfig):
+    """
+    Configuration class for CBLlama, a Cerebras modified version of the Llama model where
+    all layers use sparse LM-Infinite-style attention, except for specific layers defined by `dense_attn_layer_indices`.
+    Sparse attention is controlled by two parameters:
+    - `sliding_window_size`: Defines the size of the sliding window for local attention.
+    - `num_sink_tokens`: Specifies the number of sink tokens (prefix that can be always attended to).
+    Args:
+        sliding_window_size (int, optional): The size of the sliding window for sparse attention.
+            Defaults to 8192.
+        num_sink_tokens (int, optional): The number of sink tokens used in sparse attention.
+            Defaults to 512.
+        dense_attn_layer_indices (list[int] or None, optional): Indices of layers that use dense
+            attention instead of sparse. If None, all layers use sparse attention. Defaults to None.
+        lm_inf_headwise_limit (int, optional): If input sequence is longer than lm_inf_headwise_limit,
+            a slower more memory-efficient for-loop over heads is done.
+            Defaults to `2.5 * sliding_window_size`.
+        **kwargs: Additional arguments passed to the base `LlamaConfig` class.
+    """
+    def __init__(
+        self,
+        sliding_window_size=8192,
+        num_sink_tokens=512,
+        dense_attn_layer_indices=None,
+        lm_inf_headwise_limit=None,
+        **kwargs,
+    ):
+        self.sliding_window_size = sliding_window_size
+        self.num_sink_tokens = num_sink_tokens
+        self.dense_attn_layer_indices = dense_attn_layer_indices
+        self.lm_inf_headwise_limit = (
+            lm_inf_headwise_limit
+            if lm_inf_headwise_limit is not None
+            else 2.5 * sliding_window_size
+        )
+        super().__init__(**kwargs)

generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "bos_token_id": 128000,
+  "do_sample": true,
+  "eos_token_id": [
+    128001,
+    128008,
+    128009
+  ],
+  "temperature": 0.6,
+  "top_p": 0.9,
+  "transformers_version": "4.45.2"
+}

helmet_result.png ADDED Viewed

lm_infinite_attention.py ADDED Viewed

	@@ -0,0 +1,244 @@

+# This code is based on https://github.com/Glaciohound/LM-Infinite by Chi Han
+# Licensed under the MIT License
+import torch
+def pad_sequence_to_length(tensor, length, value):
+    return torch.cat(
+        (
+            tensor,
+            torch.zeros(
+                *tensor.shape[:-2],
+                length - tensor.shape[-2],
+                tensor.shape[-1],
+                dtype=tensor.dtype
+            ).to(tensor.device),
+        ),
+        dim=-2,
+    )
+def blockwise_sequence(sequence, block_size):
+    pad_to_length = ((sequence.shape[-2] - 1) // block_size + 1) * block_size
+    padded = pad_sequence_to_length(sequence, pad_to_length, 0)
+    blockwise = padded.view(
+        *sequence.shape[:-2], pad_to_length // block_size, block_size, -1
+    )
+    return blockwise
+def shift_and_pair(blockwise):
+    return torch.cat(
+        (
+            torch.cat((blockwise[..., -1, None, :, :], blockwise[..., :-1, :, :],), -3),
+            blockwise,
+        ),
+        -2,
+    )
+class LambdaMatmul:
+    def __init__(
+        self,
+        key_rot,
+        key_stationary,
+        query_rot,
+        query_stationary,
+        local_branch,
+        global_branch,
+    ):
+        query_length = query_rot.shape[-2]
+        key_length = key_rot.shape[-2]
+        embed_dim = key_rot.shape[-1]
+        dtype = key_rot.dtype
+        device = key_rot.device
+        min_value = torch.finfo(dtype).min
+        if query_length == 1:
+            self.mode = "single_query"
+        elif query_length < key_length:
+            assert query_length + local_branch + global_branch == key_length
+            self.mode = "cached"
+        elif query_length <= local_branch:
+            self.mode = "short_seq"
+        else:
+            self.mode = "long_seq"
+        attn_stationary = torch.matmul(
+            query_stationary, key_stationary[..., :global_branch, :].transpose(-1, -2)
+        )
+        attn_stationary = torch.where(
+            torch.ones(attn_stationary.shape[-2:], dtype=torch.bool)
+            .to(device)
+            .triu(-local_branch + 1 + key_length - query_length),
+            min_value,
+            attn_stationary,
+        )
+        if self.mode == "short_seq":
+            attn_rot = torch.matmul(query_rot, key_rot.transpose(-1, -2))
+            attn_rot = torch.where(
+                torch.ones(attn_rot.shape[-2:], dtype=torch.bool).to(device).triu(1),
+                min_value,
+                attn_rot,
+            )
+            self.attn = attn_rot
+            self.attn[..., :, :global_branch] = torch.where(
+                attn_stationary > min_value / 2,
+                attn_stationary,
+                attn_rot[..., :, :global_branch],
+            )
+        elif self.mode == "single_query":
+            attn_rot = torch.matmul(
+                query_rot,
+                key_rot[..., max(0, key_length - local_branch) :, :].transpose(-1, -2),
+            )
+            self.attn = torch.cat((attn_stationary, attn_rot), -1)
+        elif self.mode == "long_seq":
+            pad_to_length = ((query_length - 1) // local_branch + 1) * local_branch
+            patch_size = pad_to_length - query_length
+            segmented_query_rot = blockwise_sequence(query_rot, local_branch)
+            segmented_key_rot = blockwise_sequence(key_rot, local_branch)
+            segmented_key_rot = shift_and_pair(segmented_key_rot)
+            attn_rot = torch.matmul(
+                segmented_query_rot, segmented_key_rot.transpose(-1, -2)
+            )
+            attn_rot = torch.where(
+                torch.ones((local_branch, 2 * local_branch), dtype=torch.bool)
+                .to(device)
+                .triu(1)
+                .tril(local_branch)
+                .logical_not(),
+                min_value,
+                attn_rot,
+            )
+            attn_rot[..., 0, :, :local_branch] = min_value
+            if patch_size != 0:
+                attn_rot[..., -1, -patch_size:, :] = min_value
+                attn_rot[..., -1, :, -patch_size:] = min_value
+            attn_rot = attn_rot.view(query_rot.shape[:-2] + (-1, local_branch * 2))
+            attn_stationary = pad_sequence_to_length(
+                attn_stationary, pad_to_length, min_value
+            )
+            self.pad_to_length = pad_to_length
+            self.attn = torch.cat((attn_stationary, attn_rot), -1)
+        elif self.mode == "cached":
+            pad_to_length = ((query_length - 1) // local_branch + 1) * local_branch
+            patch_size = pad_to_length - query_length
+            segmented_query_rot = blockwise_sequence(query_rot, local_branch)
+            segmented_key_rot = blockwise_sequence(
+                key_rot[..., global_branch:, :], local_branch
+            )
+            segmented_key_rot = shift_and_pair(segmented_key_rot)[..., 1:, :, :]
+            attn_rot = torch.matmul(
+                segmented_query_rot, segmented_key_rot.transpose(-1, -2)
+            )
+            attn_rot = torch.where(
+                torch.ones((local_branch, 2 * local_branch), dtype=torch.bool)
+                .to(device)
+                .triu(1)
+                .tril(local_branch)
+                .logical_not(),
+                min_value,
+                attn_rot,
+            )
+            if patch_size != 0:
+                attn_rot[..., -1, -patch_size:, :] = min_value
+                attn_rot[..., -1, :, -patch_size:] = min_value
+            attn_rot = attn_rot.view(query_rot.shape[:-2] + (-1, local_branch * 2))
+            attn_stationary = pad_sequence_to_length(
+                attn_stationary, pad_to_length, min_value
+            )
+            self.pad_to_length = pad_to_length
+            self.attn = torch.cat((attn_stationary, attn_rot), -1)
+        else:
+            raise NotImplementedError()
+        self.query_length = query_length
+        self.key_length = key_length
+        self.min_value = min_value
+        self.global_branch = global_branch
+        self.local_branch = local_branch
+        self.embed_dim = embed_dim
+    def __truediv__(self, scalar):
+        self.attn.div_(scalar).clamp_(min=self.min_value)
+        return self
+    def __mul__(self, scalar):
+        self.attn.mul_(scalar).clamp_(min=self.min_value)
+        return self
+    def local_branch_add(self, other):
+        self.attn[..., -self.local_branch * 2 :].add_(other).clamp_(min=self.min_value)
+        return self
+    def global_branch_add(self, other):
+        self.attn[..., : -self.local_branch * 2].add_(other).clamp_(min=self.min_value)
+        return self
+    def dropout(self, dropout):
+        self.attn = dropout(self.attn)
+        return self
+    def softmax(self):
+        self.attn = self.attn.softmax(-1)
+        return self
+    def to(self, destination):
+        self.attn = self.attn.to(destination)
+        return self
+    def matmul(self, value):
+        if self.mode == "short_seq":
+            output = torch.matmul(self.attn, value)
+        elif self.mode == "single_query":
+            output = torch.matmul(
+                self.attn,
+                torch.cat(
+                    (
+                        value[..., : self.global_branch, :],
+                        value[..., max(0, self.key_length - self.local_branch) :, :],
+                    ),
+                    -2,
+                ),
+            )
+        elif self.mode == "long_seq":
+            segmented_value = shift_and_pair(
+                blockwise_sequence(value, self.local_branch)
+            )
+            output_stationary = torch.matmul(
+                self.attn[..., : self.query_length, : self.global_branch],
+                value[..., : self.global_branch, :],
+            )
+            output_rot = torch.matmul(
+                self.attn[..., self.global_branch :].view(
+                    self.attn.shape[:-2]
+                    + (-1, self.local_branch, self.local_branch * 2)
+                ),
+                segmented_value,
+            ).view(self.attn.shape[:-2] + (self.pad_to_length, -1))
+            output = output_stationary + output_rot[..., : self.query_length, :]
+        elif self.mode == "cached":
+            segmented_value = blockwise_sequence(
+                value[..., self.global_branch :, :], self.local_branch
+            )
+            segmented_value = shift_and_pair(segmented_value)[..., 1:, :, :]
+            output_stationary = torch.matmul(
+                self.attn[..., : self.query_length, : self.global_branch],
+                value[..., : self.global_branch, :],
+            )
+            output_rot = torch.matmul(
+                self.attn[..., self.global_branch :].view(
+                    self.attn.shape[:-2]
+                    + (-1, self.local_branch, self.local_branch * 2)
+                ),
+                segmented_value,
+            ).view(self.attn.shape[:-2] + (self.pad_to_length, -1))
+            output = output_stationary + output_rot[..., : self.query_length, :]
+        else:
+            raise NotImplementedError
+        del self.attn
+        return output

model-00001-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e225b7e74829d09451bb99d64616c59357f8637cf7d0300ae33096a88682bfc6
+size 4976698672

model-00002-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:abfe8adebf0451e198a217822f6b4703c459220bdfc9ce1be0c222fc8ba9e959
+size 4999802720

model-00003-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b3be4be1b8320be2918daf164b9acc408dabec8a880733b1dc6ee176352e265a
+size 4915916176

model-00004-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4c875c8208a8e53cdeeceb92d74bebb7ec267e38a2d146c52258a23e2bf41cd0
+size 1168138808

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,298 @@

+{
+  "metadata": {
+    "total_size": 16060522496
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00004-of-00004.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.norm.weight": "model-00004-of-00004.safetensors"
+  }
+}

modeling_cbllama.py ADDED Viewed

	@@ -0,0 +1,183 @@

+# This code is based on https://github.com/Glaciohound/LM-Infinite by Chi Han
+# Licensed under the MIT License
+import math
+from typing import Optional, Tuple
+import torch
+import torch.nn as nn
+from transformers.cache_utils import Cache
+from transformers.models.llama.modeling_llama import (
+    LlamaAttention,
+    LlamaDecoderLayer,
+    LlamaModel,
+    LlamaForCausalLM,
+    rotate_half,
+    repeat_kv,
+)
+from .configuration_cbllama import CBLlamaConfig
+from .lm_infinite_attention import LambdaMatmul
+def apply_rotary_pos_emb(vec, cos, sin, position_ids):
+    # The first two dimensions of cos and sin are always 1, so we can `squeeze` them.
+    cos = cos.squeeze(0)  # [seq_len, dim]
+    sin = sin.squeeze(0)  # [seq_len, dim]
+    cos = cos[position_ids].unsqueeze(1)  # [bs, 1, seq_len, dim]
+    sin = sin[position_ids].unsqueeze(1)  # [bs, 1, seq_len, dim]
+    vec_embed = (vec * cos) + (rotate_half(vec) * sin)
+    return vec_embed
+class CBSparseLlamaAttention(LlamaAttention):
+    def __init__(self, config: CBLlamaConfig, layer_idx: int):
+        super().__init__(config, layer_idx)
+        self.sliding_window_size = config.sliding_window_size
+        self.num_sink_tokens = config.num_sink_tokens
+        self.limit_distance = config.sliding_window_size
+        self.headwise_limit = config.lm_inf_headwise_limit
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_value: Optional[Cache] = None,
+        output_attentions: bool = False,
+        use_cache: bool = False,
+        cache_position: Optional[torch.LongTensor] = None,
+        position_embeddings: Optional[
+            Tuple[torch.Tensor, torch.Tensor]
+        ] = None,  # will become mandatory in v4.46
+        **kwargs,
+    ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
+        bsz, q_len, _ = hidden_states.size()
+        query_states = self.q_proj(hidden_states)
+        key_states = self.k_proj(hidden_states)
+        value_states = self.v_proj(hidden_states)
+        query_states = query_states.view(
+            bsz, q_len, self.num_heads, self.head_dim
+        ).transpose(1, 2)
+        key_states = key_states.view(
+            bsz, q_len, self.num_key_value_heads, self.head_dim
+        ).transpose(1, 2)
+        value_states = value_states.view(
+            bsz, q_len, self.num_key_value_heads, self.head_dim
+        ).transpose(1, 2)
+        if past_key_value is not None:
+            key_states, value_states = past_key_value.update(
+                key_states, value_states, self.layer_idx, {}
+            )
+        kv_seq_len = key_states.shape[-2]
+        key_position_ids = torch.arange(kv_seq_len, device=query_states.device)[None]
+        # inv_freq controls the dtype of rotation phase, which can be large
+        self.rotary_emb.inv_freq = self.rotary_emb.inv_freq.to(torch.float32)
+        # cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+        cos, sin = self.rotary_emb(value_states, key_position_ids)
+        rot_query_states = apply_rotary_pos_emb(query_states, cos, sin, position_ids)
+        rot_key_states = apply_rotary_pos_emb(key_states, cos, sin, key_position_ids)
+        rot_key_states = repeat_kv(rot_key_states, self.num_key_value_groups)
+        key_states = repeat_kv(key_states, self.num_key_value_groups)
+        value_states = repeat_kv(value_states, self.num_key_value_groups)
+        if self.limit_distance is None:
+            stationary_key_states = rot_key_states
+            stationary_query_states = rot_query_states
+        else:
+            stationary_key_states = key_states
+            effective_limit_distance = min(self.limit_distance, kv_seq_len - 1)
+            stationary_query_states = (
+                query_states * cos[0, effective_limit_distance]
+            ) + (rotate_half(query_states) * sin[0, effective_limit_distance])
+        if q_len > self.headwise_limit:
+            # head-by-head is slower but more memory efficient
+            for head_i in range(self.num_heads):
+                query_states[:, head_i] = (
+                    (
+                        LambdaMatmul(
+                            rot_key_states[:, head_i],
+                            stationary_key_states[:, head_i],
+                            rot_query_states[:, head_i],
+                            stationary_query_states[:, head_i],
+                            self.sliding_window_size,
+                            self.num_sink_tokens,
+                        )
+                        / math.sqrt(self.head_dim)
+                    )
+                    .softmax()
+                    .matmul(value_states[:, head_i])
+                )
+        else:
+            query_states = (
+                (
+                    LambdaMatmul(
+                        rot_key_states,
+                        stationary_key_states,
+                        rot_query_states,
+                        stationary_query_states,
+                        self.sliding_window_size,
+                        self.num_sink_tokens,
+                    )
+                    / math.sqrt(self.head_dim)
+                )
+                .softmax()
+                .matmul(value_states)
+            )
+        attn_output = query_states
+        if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
+            raise ValueError(
+                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+                f" {attn_output.size()}"
+            )
+        attn_output = attn_output.transpose(1, 2)
+        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+        attn_output = self.o_proj(attn_output)
+        return attn_output, None, past_key_value
+class CBHybridLlamaDecoderLayer(LlamaDecoderLayer):
+    def __init__(self, config: CBLlamaConfig, layer_idx: int):
+        super().__init__(config, layer_idx)
+        if (
+            config.dense_attn_layer_indices
+            and layer_idx not in config.dense_attn_layer_indices
+        ):
+            self.self_attn = CBSparseLlamaAttention(config, layer_idx)
+class CBHybridLlamaModel(LlamaModel):
+    config_class = CBLlamaConfig
+    def __init__(self, config: CBLlamaConfig):
+        super().__init__(config)
+        self.layers = nn.ModuleList(
+            [
+                CBHybridLlamaDecoderLayer(config, layer_idx)
+                for layer_idx in range(config.num_hidden_layers)
+            ]
+        )
+class CBHybridLlamaForCausalLM(LlamaForCausalLM):
+    config_class = CBLlamaConfig
+    def __init__(self, config):
+        super().__init__(config)
+        self.model = CBHybridLlamaModel(config)

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+    "bos_token": {
+        "content": "<|begin_of_text|>",
+        "lstrip": false,
+        "normalized": false,
+        "rstrip": false,
+        "single_word": false
+    },
+    "eos_token": {
+        "content": "<|eot_id|>",
+        "lstrip": false,
+        "normalized": false,
+        "rstrip": false,
+        "single_word": false
+    },
+    "memory_token": "<|memory|>",
+    "memory_end_token": "<|memory_end|>"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,2069 @@

+{
+    "added_tokens_decoder": {
+        "128000": {
+            "content": "<|begin_of_text|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128001": {
+            "content": "<|end_of_text|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128002": {
+            "content": "<|memory|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128003": {
+            "content": "<|memory_end|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128004": {
+            "content": "<|finetune_right_pad_id|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128005": {
+            "content": "<|reserved_special_token_2|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128006": {
+            "content": "<|start_header_id|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128007": {
+            "content": "<|end_header_id|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128008": {
+            "content": "<|eom_id|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128009": {
+            "content": "<|eot_id|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128010": {
+            "content": "<|python_tag|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128011": {
+            "content": "<|reserved_special_token_3|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128012": {
+            "content": "<|reserved_special_token_4|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128013": {
+            "content": "<|reserved_special_token_5|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128014": {
+            "content": "<|reserved_special_token_6|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128015": {
+            "content": "<|reserved_special_token_7|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128016": {
+            "content": "<|reserved_special_token_8|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128017": {
+            "content": "<|reserved_special_token_9|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128018": {
+            "content": "<|reserved_special_token_10|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128019": {
+            "content": "<|reserved_special_token_11|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128020": {
+            "content": "<|reserved_special_token_12|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128021": {
+            "content": "<|reserved_special_token_13|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128022": {
+            "content": "<|reserved_special_token_14|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128023": {
+            "content": "<|reserved_special_token_15|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128024": {
+            "content": "<|reserved_special_token_16|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128025": {
+            "content": "<|reserved_special_token_17|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128026": {
+            "content": "<|reserved_special_token_18|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128027": {
+            "content": "<|reserved_special_token_19|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128028": {
+            "content": "<|reserved_special_token_20|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128029": {
+            "content": "<|reserved_special_token_21|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128030": {
+            "content": "<|reserved_special_token_22|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128031": {
+            "content": "<|reserved_special_token_23|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128032": {
+            "content": "<|reserved_special_token_24|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128033": {
+            "content": "<|reserved_special_token_25|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128034": {
+            "content": "<|reserved_special_token_26|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128035": {
+            "content": "<|reserved_special_token_27|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128036": {
+            "content": "<|reserved_special_token_28|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128037": {
+            "content": "<|reserved_special_token_29|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128038": {
+            "content": "<|reserved_special_token_30|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128039": {
+            "content": "<|reserved_special_token_31|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128040": {
+            "content": "<|reserved_special_token_32|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128041": {
+            "content": "<|reserved_special_token_33|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128042": {
+            "content": "<|reserved_special_token_34|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128043": {
+            "content": "<|reserved_special_token_35|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128044": {
+            "content": "<|reserved_special_token_36|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128045": {
+            "content": "<|reserved_special_token_37|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128046": {
+            "content": "<|reserved_special_token_38|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128047": {
+            "content": "<|reserved_special_token_39|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128048": {
+            "content": "<|reserved_special_token_40|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128049": {
+            "content": "<|reserved_special_token_41|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128050": {
+            "content": "<|reserved_special_token_42|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128051": {
+            "content": "<|reserved_special_token_43|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128052": {
+            "content": "<|reserved_special_token_44|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128053": {
+            "content": "<|reserved_special_token_45|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128054": {
+            "content": "<|reserved_special_token_46|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128055": {
+            "content": "<|reserved_special_token_47|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128056": {
+            "content": "<|reserved_special_token_48|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128057": {
+            "content": "<|reserved_special_token_49|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128058": {
+            "content": "<|reserved_special_token_50|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128059": {
+            "content": "<|reserved_special_token_51|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128060": {
+            "content": "<|reserved_special_token_52|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128061": {
+            "content": "<|reserved_special_token_53|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128062": {
+            "content": "<|reserved_special_token_54|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128063": {
+            "content": "<|reserved_special_token_55|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128064": {
+            "content": "<|reserved_special_token_56|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128065": {
+            "content": "<|reserved_special_token_57|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128066": {
+            "content": "<|reserved_special_token_58|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128067": {
+            "content": "<|reserved_special_token_59|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128068": {
+            "content": "<|reserved_special_token_60|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128069": {
+            "content": "<|reserved_special_token_61|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128070": {
+            "content": "<|reserved_special_token_62|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128071": {
+            "content": "<|reserved_special_token_63|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128072": {
+            "content": "<|reserved_special_token_64|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128073": {
+            "content": "<|reserved_special_token_65|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128074": {
+            "content": "<|reserved_special_token_66|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128075": {
+            "content": "<|reserved_special_token_67|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128076": {
+            "content": "<|reserved_special_token_68|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128077": {
+            "content": "<|reserved_special_token_69|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128078": {
+            "content": "<|reserved_special_token_70|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128079": {
+            "content": "<|reserved_special_token_71|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128080": {
+            "content": "<|reserved_special_token_72|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128081": {
+            "content": "<|reserved_special_token_73|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128082": {
+            "content": "<|reserved_special_token_74|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128083": {
+            "content": "<|reserved_special_token_75|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128084": {
+            "content": "<|reserved_special_token_76|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128085": {
+            "content": "<|reserved_special_token_77|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128086": {
+            "content": "<|reserved_special_token_78|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128087": {
+            "content": "<|reserved_special_token_79|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128088": {
+            "content": "<|reserved_special_token_80|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128089": {
+            "content": "<|reserved_special_token_81|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128090": {
+            "content": "<|reserved_special_token_82|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128091": {
+            "content": "<|reserved_special_token_83|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128092": {
+            "content": "<|reserved_special_token_84|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128093": {
+            "content": "<|reserved_special_token_85|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128094": {
+            "content": "<|reserved_special_token_86|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128095": {
+            "content": "<|reserved_special_token_87|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128096": {
+            "content": "<|reserved_special_token_88|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128097": {
+            "content": "<|reserved_special_token_89|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128098": {
+            "content": "<|reserved_special_token_90|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128099": {
+            "content": "<|reserved_special_token_91|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128100": {
+            "content": "<|reserved_special_token_92|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128101": {
+            "content": "<|reserved_special_token_93|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128102": {
+            "content": "<|reserved_special_token_94|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128103": {
+            "content": "<|reserved_special_token_95|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128104": {
+            "content": "<|reserved_special_token_96|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128105": {
+            "content": "<|reserved_special_token_97|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128106": {
+            "content": "<|reserved_special_token_98|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128107": {
+            "content": "<|reserved_special_token_99|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128108": {
+            "content": "<|reserved_special_token_100|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128109": {
+            "content": "<|reserved_special_token_101|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128110": {
+            "content": "<|reserved_special_token_102|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128111": {
+            "content": "<|reserved_special_token_103|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128112": {
+            "content": "<|reserved_special_token_104|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128113": {
+            "content": "<|reserved_special_token_105|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128114": {
+            "content": "<|reserved_special_token_106|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128115": {
+            "content": "<|reserved_special_token_107|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128116": {
+            "content": "<|reserved_special_token_108|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128117": {
+            "content": "<|reserved_special_token_109|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128118": {
+            "content": "<|reserved_special_token_110|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128119": {
+            "content": "<|reserved_special_token_111|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128120": {
+            "content": "<|reserved_special_token_112|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128121": {
+            "content": "<|reserved_special_token_113|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128122": {
+            "content": "<|reserved_special_token_114|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128123": {
+            "content": "<|reserved_special_token_115|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128124": {
+            "content": "<|reserved_special_token_116|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128125": {
+            "content": "<|reserved_special_token_117|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128126": {
+            "content": "<|reserved_special_token_118|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128127": {
+            "content": "<|reserved_special_token_119|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128128": {
+            "content": "<|reserved_special_token_120|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128129": {
+            "content": "<|reserved_special_token_121|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128130": {
+            "content": "<|reserved_special_token_122|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128131": {
+            "content": "<|reserved_special_token_123|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128132": {
+            "content": "<|reserved_special_token_124|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128133": {
+            "content": "<|reserved_special_token_125|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128134": {
+            "content": "<|reserved_special_token_126|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128135": {
+            "content": "<|reserved_special_token_127|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128136": {
+            "content": "<|reserved_special_token_128|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128137": {
+            "content": "<|reserved_special_token_129|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128138": {
+            "content": "<|reserved_special_token_130|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128139": {
+            "content": "<|reserved_special_token_131|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128140": {
+            "content": "<|reserved_special_token_132|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128141": {
+            "content": "<|reserved_special_token_133|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128142": {
+            "content": "<|reserved_special_token_134|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128143": {
+            "content": "<|reserved_special_token_135|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128144": {
+            "content": "<|reserved_special_token_136|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128145": {
+            "content": "<|reserved_special_token_137|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128146": {
+            "content": "<|reserved_special_token_138|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128147": {
+            "content": "<|reserved_special_token_139|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128148": {
+            "content": "<|reserved_special_token_140|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128149": {
+            "content": "<|reserved_special_token_141|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128150": {
+            "content": "<|reserved_special_token_142|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128151": {
+            "content": "<|reserved_special_token_143|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128152": {
+            "content": "<|reserved_special_token_144|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128153": {
+            "content": "<|reserved_special_token_145|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128154": {
+            "content": "<|reserved_special_token_146|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128155": {
+            "content": "<|reserved_special_token_147|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128156": {
+            "content": "<|reserved_special_token_148|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128157": {
+            "content": "<|reserved_special_token_149|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128158": {
+            "content": "<|reserved_special_token_150|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128159": {
+            "content": "<|reserved_special_token_151|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128160": {
+            "content": "<|reserved_special_token_152|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128161": {
+            "content": "<|reserved_special_token_153|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128162": {
+            "content": "<|reserved_special_token_154|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128163": {
+            "content": "<|reserved_special_token_155|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128164": {
+            "content": "<|reserved_special_token_156|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128165": {
+            "content": "<|reserved_special_token_157|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128166": {
+            "content": "<|reserved_special_token_158|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128167": {
+            "content": "<|reserved_special_token_159|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128168": {
+            "content": "<|reserved_special_token_160|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128169": {
+            "content": "<|reserved_special_token_161|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128170": {
+            "content": "<|reserved_special_token_162|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128171": {
+            "content": "<|reserved_special_token_163|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128172": {
+            "content": "<|reserved_special_token_164|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128173": {
+            "content": "<|reserved_special_token_165|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128174": {
+            "content": "<|reserved_special_token_166|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128175": {
+            "content": "<|reserved_special_token_167|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128176": {
+            "content": "<|reserved_special_token_168|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128177": {
+            "content": "<|reserved_special_token_169|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128178": {
+            "content": "<|reserved_special_token_170|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128179": {
+            "content": "<|reserved_special_token_171|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128180": {
+            "content": "<|reserved_special_token_172|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128181": {
+            "content": "<|reserved_special_token_173|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128182": {
+            "content": "<|reserved_special_token_174|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128183": {
+            "content": "<|reserved_special_token_175|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128184": {
+            "content": "<|reserved_special_token_176|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128185": {
+            "content": "<|reserved_special_token_177|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128186": {
+            "content": "<|reserved_special_token_178|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128187": {
+            "content": "<|reserved_special_token_179|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128188": {
+            "content": "<|reserved_special_token_180|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128189": {
+            "content": "<|reserved_special_token_181|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128190": {
+            "content": "<|reserved_special_token_182|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128191": {
+            "content": "<|reserved_special_token_183|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128192": {
+            "content": "<|reserved_special_token_184|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128193": {
+            "content": "<|reserved_special_token_185|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128194": {
+            "content": "<|reserved_special_token_186|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128195": {
+            "content": "<|reserved_special_token_187|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128196": {
+            "content": "<|reserved_special_token_188|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128197": {
+            "content": "<|reserved_special_token_189|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128198": {
+            "content": "<|reserved_special_token_190|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128199": {
+            "content": "<|reserved_special_token_191|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128200": {
+            "content": "<|reserved_special_token_192|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128201": {
+            "content": "<|reserved_special_token_193|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128202": {
+            "content": "<|reserved_special_token_194|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128203": {
+            "content": "<|reserved_special_token_195|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128204": {
+            "content": "<|reserved_special_token_196|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128205": {
+            "content": "<|reserved_special_token_197|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128206": {
+            "content": "<|reserved_special_token_198|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128207": {
+            "content": "<|reserved_special_token_199|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128208": {
+            "content": "<|reserved_special_token_200|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128209": {
+            "content": "<|reserved_special_token_201|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128210": {
+            "content": "<|reserved_special_token_202|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128211": {
+            "content": "<|reserved_special_token_203|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128212": {
+            "content": "<|reserved_special_token_204|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128213": {
+            "content": "<|reserved_special_token_205|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128214": {
+            "content": "<|reserved_special_token_206|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128215": {
+            "content": "<|reserved_special_token_207|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128216": {
+            "content": "<|reserved_special_token_208|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128217": {
+            "content": "<|reserved_special_token_209|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128218": {
+            "content": "<|reserved_special_token_210|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128219": {
+            "content": "<|reserved_special_token_211|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128220": {
+            "content": "<|reserved_special_token_212|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128221": {
+            "content": "<|reserved_special_token_213|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128222": {
+            "content": "<|reserved_special_token_214|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128223": {
+            "content": "<|reserved_special_token_215|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128224": {
+            "content": "<|reserved_special_token_216|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128225": {
+            "content": "<|reserved_special_token_217|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128226": {
+            "content": "<|reserved_special_token_218|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128227": {
+            "content": "<|reserved_special_token_219|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128228": {
+            "content": "<|reserved_special_token_220|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128229": {
+            "content": "<|reserved_special_token_221|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128230": {
+            "content": "<|reserved_special_token_222|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128231": {
+            "content": "<|reserved_special_token_223|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128232": {
+            "content": "<|reserved_special_token_224|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128233": {
+            "content": "<|reserved_special_token_225|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128234": {
+            "content": "<|reserved_special_token_226|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128235": {
+            "content": "<|reserved_special_token_227|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128236": {
+            "content": "<|reserved_special_token_228|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128237": {
+            "content": "<|reserved_special_token_229|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128238": {
+            "content": "<|reserved_special_token_230|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128239": {
+            "content": "<|reserved_special_token_231|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128240": {
+            "content": "<|reserved_special_token_232|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128241": {
+            "content": "<|reserved_special_token_233|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128242": {
+            "content": "<|reserved_special_token_234|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128243": {
+            "content": "<|reserved_special_token_235|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128244": {
+            "content": "<|reserved_special_token_236|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128245": {
+            "content": "<|reserved_special_token_237|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128246": {
+            "content": "<|reserved_special_token_238|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128247": {
+            "content": "<|reserved_special_token_239|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128248": {
+            "content": "<|reserved_special_token_240|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128249": {
+            "content": "<|reserved_special_token_241|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128250": {
+            "content": "<|reserved_special_token_242|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128251": {
+            "content": "<|reserved_special_token_243|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128252": {
+            "content": "<|reserved_special_token_244|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128253": {
+            "content": "<|reserved_special_token_245|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128254": {
+            "content": "<|reserved_special_token_246|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "128255": {
+            "content": "<|reserved_special_token_247|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        }
+    },
+    "bos_token": "<|begin_of_text|>",
+    "memory_token": "<|memory|>",
+    "memory_end_token": "<|memory_end|>",
+    "chat_template": "{{- bos_token }}\n{%- if custom_tools is defined %}\n    {%- set tools = custom_tools %}\n{%- endif %}\n{%- if not tools_in_user_message is defined %}\n    {%- set tools_in_user_message = true %}\n{%- endif %}\n{%- if not date_string is defined %}\n    {%- set date_string = \"26 Jul 2024\" %}\n{%- endif %}\n{%- if not tools is defined %}\n    {%- set tools = none %}\n{%- endif %}\n\n{#- This block extracts the system message, so we can slot it into the right place. #}\n{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content']|trim %}\n    {%- set messages = messages[1:] %}\n{%- else %}\n    {%- set system_message = \"\" %}\n{%- endif %}\n\n{#- System message + builtin tools #}\n{{- \"<|start_header_id|>system<|end_header_id|>\\n\\n\" }}\n{%- if builtin_tools is defined or tools is not none %}\n    {{- \"Environment: ipython\\n\" }}\n{%- endif %}\n{%- if builtin_tools is defined %}\n    {{- \"Tools: \" + builtin_tools | reject('equalto', 'code_interpreter') | join(\", \") + \"\\n\\n\"}}\n{%- endif %}\n{{- \"Cutting Knowledge Date: December 2023\\n\" }}\n{{- \"Today Date: \" + date_string + \"\\n\\n\" }}\n{%- if tools is not none and not tools_in_user_message %}\n    {{- \"You have access to the following functions. To call a function, please respond with JSON for a function call.\" }}\n    {{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n    {{- \"Do not use variables.\\n\\n\" }}\n    {%- for t in tools %}\n        {{- t | tojson(indent=4) }}\n        {{- \"\\n\\n\" }}\n    {%- endfor %}\n{%- endif %}\n{{- system_message }}\n{{- \"<|eot_id|>\" }}\n\n{#- Custom tools are passed in a user message with some extra guidance #}\n{%- if tools_in_user_message and not tools is none %}\n    {#- Extract the first user message so we can plug it in here #}\n    {%- if messages | length != 0 %}\n        {%- set first_user_message = messages[0]['content']|trim %}\n        {%- set messages = messages[1:] %}\n    {%- else %}\n        {{- raise_exception(\"Cannot put tools in the first user message when there's no first user message!\") }}\n{%- endif %}\n    {{- '<|start_header_id|>user<|end_header_id|>\\n\\n' -}}\n    {{- \"Given the following functions, please respond with a JSON for a function call \" }}\n    {{- \"with its proper arguments that best answers the given prompt.\\n\\n\" }}\n    {{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n    {{- \"Do not use variables.\\n\\n\" }}\n    {%- for t in tools %}\n        {{- t | tojson(indent=4) }}\n        {{- \"\\n\\n\" }}\n    {%- endfor %}\n    {{- first_user_message + \"<|eot_id|>\"}}\n{%- endif %}\n\n{%- for message in messages %}\n    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}\n        {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\\n\\n'+ message['content'] | trim + '<|eot_id|>' }}\n    {%- elif 'tool_calls' in message %}\n        {%- if not message.tool_calls|length == 1 %}\n            {{- raise_exception(\"This model only supports single tool-calls at once!\") }}\n        {%- endif %}\n        {%- set tool_call = message.tool_calls[0].function %}\n        {%- if builtin_tools is defined and tool_call.name in builtin_tools %}\n            {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' -}}\n            {{- \"<|python_tag|>\" + tool_call.name + \".call(\" }}\n            {%- for arg_name, arg_val in tool_call.arguments | items %}\n                {{- arg_name + '=\"' + arg_val + '\"' }}\n                {%- if not loop.last %}\n                    {{- \", \" }}\n                {%- endif %}\n                {%- endfor %}\n            {{- \")\" }}\n        {%- else  %}\n            {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' -}}\n            {{- '{\"name\": \"' + tool_call.name + '\", ' }}\n            {{- '\"parameters\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- \"}\" }}\n        {%- endif %}\n        {%- if builtin_tools is defined %}\n            {#- This means we're in ipython mode #}\n            {{- \"<|eom_id|>\" }}\n        {%- else %}\n            {{- \"<|eot_id|>\" }}\n        {%- endif %}\n    {%- elif message.role == \"tool\" or message.role == \"ipython\" %}\n        {{- \"<|start_header_id|>ipython<|end_header_id|>\\n\\n\" }}\n        {%- if message.content is mapping or message.content is iterable %}\n            {{- message.content | tojson }}\n        {%- else %}\n            {{- message.content }}\n        {%- endif %}\n        {{- \"<|eot_id|>\" }}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }}\n{%- endif %}\n",
+    "clean_up_tokenization_spaces": true,
+    "eos_token": "<|eot_id|>",
+    "model_input_names": [
+        "input_ids",
+        "attention_mask"
+    ],
+    "model_max_length": 131072,
+    "auto_map": {
+        "AutoTokenizer": [
+            "cb_tokenizer.CBPreTrainedTokenizerFast",
+            null
+        ]
+    }
+}