haukelicht commited on
Commit
04ad7b0
·
verified ·
1 Parent(s): 84e1cc9

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: answerdotai/ModernBERT-base
3
+ language:
4
+ - en
5
+ license: apache-2.0
6
+ tags:
7
+ - text
8
+ - token-classification
9
+ - named-entity-recognition
10
+ - encoder-only
11
+ - modern-bert
12
+ - fine-tuned
13
+ - domain-specific
14
+ metrics:
15
+ - seqeval
16
+
17
+ model-index:
18
+ - name: ModernBERT-base-group-mention-detector-uk-manifestos
19
+ results:
20
+ - task:
21
+ type: token-classification
22
+ name: Token classification
23
+ dataset:
24
+ type: custom
25
+ name: custom human-labeled sequence annotation dataset (see model card details)
26
+ metrics:
27
+ - type: seqeval
28
+ name: social group (seqeval)
29
+ value: 0.7179054054054054
30
+
31
+ - type: seqeval
32
+ name: political group (seqeval)
33
+ value: 0.9246231155778895
34
+
35
+ - type: seqeval
36
+ name: political institution (seqeval)
37
+ value: 0.7064803049555273
38
+
39
+ - type: seqeval
40
+ name: organization, public institution, or collective actor (seqeval)
41
+ value: 0.6093514328808447
42
+
43
+ - type: seqeval
44
+ name: implicit social group reference (seqeval)
45
+ value: 0.6971428571428572
46
+
47
+ ---
48
+
49
+ # ModernBERT-base-group-mention-detector-uk-manifestos
50
+
51
+ <!-- Provide a quick summary of what the model is/does. -->
52
+
53
+ [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) model finetuned for social group mention detectin in political texts
54
+
55
+ ## Model Details
56
+
57
+ ### Model Description
58
+
59
+ <!-- Provide a longer summary of what this model is. -->
60
+
61
+ Token classification model for (social) group mention detection based on [Licht & Sczepanski (2025)](https://doi.org/10.31219/osf.io/ufb96)
62
+
63
+ This token classification has been finetuned on human sequence annotations of sentences of British parties' election manifestos for the following entity types:
64
+
65
+ - social group
66
+ - implicit social group reference
67
+ - political group
68
+ - political institution
69
+ - organization, public institution, or collective actor
70
+
71
+ Please refer to [Licht & Sczepanski (2025)](https://doi.org/10.31219/osf.io/ufb96) for details.
72
+
73
+ - **Developed by:** Hauke Licht
74
+ - **Model type:** modernbert
75
+ - **Language(s) (NLP):** ['en']
76
+ - **License:** apache-2.0
77
+ - **Finetuned from model:** answerdotai/ModernBERT-base
78
+ - **Funded by:** *Center for Comparative and International Studies* of the ETH Zurich and the University of Zurich and the *Deutsche Forschungsgemeinschaft* (DFG, German Research Foundation) under Germany's Excellence Strategy – EXC 2126/1 – 390838866
79
+
80
+ ### Model Sources
81
+
82
+ <!-- Provide the basic links for the model. -->
83
+
84
+ - **Repository:** https://github.com/haukelicht/group_mention_detection/release/
85
+ - **Paper:** https://doi.org/10.31219/osf.io/ufb96
86
+ - **Demo:** [More Information Needed]
87
+
88
+ ## Uses
89
+
90
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
91
+
92
+ ### Bias, Risks, and Limitations
93
+
94
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
95
+
96
+ - Evaluation of the classifier in held-out data shows that it makes mistakes (see section *Results*).
97
+ - The model has been finetuned only on human-annotated labeled sentences sampled from British parties party manifestos. Applying the classifier in other domains can lead to higher error rates than those reported in section *Results* below.
98
+ - The data used to finetune the model come from human annotators. Human annotators can be biased and factors like gender and social background can impact their annotations judgments. This may lead to bias in the detection of specific social groups.
99
+
100
+ #### Recommendations
101
+
102
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
103
+
104
+ - Users who want to apply the model outside its training data domain (British parties' election programs) should evaluate its performance in the target data.
105
+ - Users who want to apply the model outside its training data domain (British parties' election programs) should contuninue to finetune this model on labeled data.
106
+
107
+ ### How to Get Started with the Model
108
+
109
+ Use the code below to get started with the model.
110
+
111
+ ```pyhton
112
+ from transformers import pipeline
113
+
114
+ model_id = "haukelicht/roberta-base-group-mention-detector-uk-manifestos"
115
+
116
+ classifier = pipeline(task="ner", model=model_id, aggregation_strategy="simple")
117
+
118
+ text = "Our party fights for the deprived and the vulnerable in our country."
119
+ annotations = classifier(text)
120
+ print(annotations)
121
+
122
+ # get annotations' character start and end indexes
123
+ locations = [(anno['start'], anno['end']) for anno in annotations]
124
+ locations
125
+
126
+ # index the source text using first annotation as an example
127
+ loc = locations[0]
128
+ text[slice(*loc)]
129
+ ```
130
+
131
+ ## Training Details
132
+
133
+ ### Training Data
134
+
135
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
136
+
137
+ The train, dev, and test splits used for model finetuning and evaluation are available on Github: https://github.com/haukelicht/group_mention_detection/release/splits
138
+
139
+ ### Training Procedure
140
+
141
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
142
+
143
+ #### Training Hyperparameters
144
+
145
+ - epochs: 2
146
+ - learning rate: 5e-05
147
+ - batch size: 8
148
+ - weight decay: 0.3
149
+ - warmup ratio: 0.1
150
+
151
+ ## Evaluation
152
+
153
+ <!-- This section describes the evaluation protocols and provides the results. -->
154
+
155
+ ### Testing Data, Factors & Metrics
156
+
157
+ #### Testing Data
158
+
159
+ <!-- This should link to a Dataset Card if possible. -->
160
+
161
+ The train, dev, and test splits used for model finetuning and evaluation are available on Github: https://github.com/haukelicht/group_mention_detection/release/splits
162
+
163
+ #### Metrics
164
+
165
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
166
+
167
+ - seq-eval F1: strict seqeuence labeling evaluation metric per CoNLL-2000 shared task based on https://github.com/chakki-works/seqeval
168
+ - "soft" seq-eval F1: a more lenient seqeuence labeling evaluation metric that reports span level average performance suzmmarized across examples per https://github.com/haukelicht/soft-seqeval
169
+ - sentence-level F1: binary measure of detection performance considering a sentence a positive example/prediction if it contains at least one enttiy to of the given type
170
+
171
+ ### Results
172
+
173
+ | type | seq-eval F1 | soft seq-eval F1 | sentence level F1 |
174
+ |-------------------------------------------------------|---------------|---------------------|----------------------|
175
+ | social group | 0.718 | 0.775 | 0.938 |
176
+ | political group | 0.925 | 0.933 | 0.990 |
177
+ | political institution | 0.706 | 0.736 | 0.954 |
178
+ | organization, public institution, or collective actor | 0.609 | 0.605 | 0.931 |
179
+ | implicit social group reference | 0.697 | 0.600 | 0.951 |
180
+
181
+ ## Citation
182
+
183
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
184
+
185
+ **BibTeX:**
186
+
187
+ [More Information Needed]
188
+
189
+ **APA:**
190
+
191
+ Licht, H., & Sczepanski, R. (2025). Detecting Group Mentions in Political Rhetoric: A Supervised Learning Approach. forthcoming in *British Journal of Political Science*. Preprint available at [OSF](https://doi.org/10.31219/osf.io/ufb96)
192
+
193
+ ## More Information
194
+
195
+ https://github.com/haukelicht/group_mention_detection/release
196
+
197
+ ## Model Card Contact
198
+
199
config.json ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertForTokenClassification"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 50281,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 50281,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "embedding_dropout": 0.0,
16
+ "eos_token_id": 50282,
17
+ "global_attn_every_n_layers": 3,
18
+ "global_rope_theta": 160000.0,
19
+ "gradient_checkpointing": false,
20
+ "hidden_activation": "gelu",
21
+ "hidden_size": 768,
22
+ "id2label": {
23
+ "0": "O",
24
+ "1": "I-social group",
25
+ "2": "I-political group",
26
+ "3": "I-political institution",
27
+ "4": "I-organization, public institution, or collective actor",
28
+ "5": "I-implicit social group reference",
29
+ "6": "B-social group",
30
+ "7": "B-political group",
31
+ "8": "B-political institution",
32
+ "9": "B-organization, public institution, or collective actor",
33
+ "10": "B-implicit social group reference"
34
+ },
35
+ "initializer_cutoff_factor": 2.0,
36
+ "initializer_range": 0.02,
37
+ "intermediate_size": 1152,
38
+ "label2id": {
39
+ "B-implicit social group reference": 10,
40
+ "B-organization, public institution, or collective actor": 9,
41
+ "B-political group": 7,
42
+ "B-political institution": 8,
43
+ "B-social group": 6,
44
+ "I-implicit social group reference": 5,
45
+ "I-organization, public institution, or collective actor": 4,
46
+ "I-political group": 2,
47
+ "I-political institution": 3,
48
+ "I-social group": 1,
49
+ "O": 0
50
+ },
51
+ "layer_norm_eps": 1e-05,
52
+ "local_attention": 128,
53
+ "local_rope_theta": 10000.0,
54
+ "max_position_embeddings": 8192,
55
+ "mlp_bias": false,
56
+ "mlp_dropout": 0.0,
57
+ "model_type": "modernbert",
58
+ "norm_bias": false,
59
+ "norm_eps": 1e-05,
60
+ "num_attention_heads": 12,
61
+ "num_hidden_layers": 22,
62
+ "pad_token_id": 50283,
63
+ "position_embedding_type": "absolute",
64
+ "repad_logits_with_grad": false,
65
+ "sep_token_id": 50282,
66
+ "sparse_pred_ignore_index": -100,
67
+ "sparse_prediction": false,
68
+ "torch_dtype": "float32",
69
+ "transformers_version": "4.51.3",
70
+ "vocab_size": 50368
71
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d8ae9ccf820d1c7400359186156a4674d39c02665ad6cb902fefe6edf558051
3
+ size 598467476
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
test_results.json ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "test_loss": 0.15207402408123016,
3
+ "test_seqeval-macro_f1": 0.7311006231925047,
4
+ "test_seqeval-macro_precision": 0.715666514743152,
5
+ "test_seqeval-macro_recall": 0.7478751050069639,
6
+ "test_seqeval-micro_f1": 0.7277296844456856,
7
+ "test_seqeval-micro_precision": 0.7096949891067538,
8
+ "test_seqeval-micro_recall": 0.7467048710601719,
9
+ "test_seqeval-political group_f1": 0.9246231155778895,
10
+ "test_seqeval-political group_precision": 0.9169435215946844,
11
+ "test_seqeval-political group_recall": 0.9324324324324325,
12
+ "test_seqeval-social group_f1": 0.7179054054054054,
13
+ "test_seqeval-social group_precision": 0.6978653530377669,
14
+ "test_seqeval-social group_recall": 0.7391304347826086,
15
+ "test_seqeval-political institution_f1": 0.7064803049555273,
16
+ "test_seqeval-political institution_precision": 0.702020202020202,
17
+ "test_seqeval-political institution_recall": 0.710997442455243,
18
+ "test_seqeval-organization, public institution, or collective actor_f1": 0.6093514328808447,
19
+ "test_seqeval-organization, public institution, or collective actor_precision": 0.5722379603399433,
20
+ "test_seqeval-organization, public institution, or collective actor_recall": 0.6516129032258065,
21
+ "test_seqeval-implicit social group reference_f1": 0.6971428571428572,
22
+ "test_seqeval-implicit social group reference_precision": 0.6892655367231638,
23
+ "test_seqeval-implicit social group reference_recall": 0.7052023121387283,
24
+ "test_softseqeval-macro_f1": 0.7298668328461072,
25
+ "test_softseqeval-macro_precision": 0.7460756740093278,
26
+ "test_softseqeval-macro_recall": 0.7291130034606972,
27
+ "test_softseqeval-micro_f1": 0.8169134258052156,
28
+ "test_softseqeval-micro_precision": 0.8347834877523216,
29
+ "test_softseqeval-micro_recall": 0.8186652452653833,
30
+ "test_softseqeval-political group_f1": 0.9325798973339957,
31
+ "test_softseqeval-political group_precision": 0.9377049180327869,
32
+ "test_softseqeval-political group_recall": 0.9331147540983608,
33
+ "test_softseqeval-social group_f1": 0.7754421558180423,
34
+ "test_softseqeval-social group_precision": 0.7973313606224999,
35
+ "test_softseqeval-social group_recall": 0.780500643870897,
36
+ "test_softseqeval-political institution_f1": 0.736250695921549,
37
+ "test_softseqeval-political institution_precision": 0.7574972314507199,
38
+ "test_softseqeval-political institution_recall": 0.7338113455389204,
39
+ "test_softseqeval-organization, public institution, or collective actor_f1": 0.605375880565754,
40
+ "test_softseqeval-organization, public institution, or collective actor_precision": 0.6332851115129596,
41
+ "test_softseqeval-organization, public institution, or collective actor_recall": 0.5975093429776973,
42
+ "test_softseqeval-implicit social group reference_f1": 0.5996855345911949,
43
+ "test_softseqeval-implicit social group reference_precision": 0.6045597484276729,
44
+ "test_softseqeval-implicit social group reference_recall": 0.60062893081761,
45
+ "test_doclevel-micro_precision": 0.9487354750512645,
46
+ "test_doclevel-micro_recall": 0.9487354750512645,
47
+ "test_doclevel-micro_f1": 0.9487354750512645,
48
+ "test_doclevel-political group_precision": 0.9904306220095693,
49
+ "test_doclevel-political group_recall": 0.9904306220095693,
50
+ "test_doclevel-political group_f1": 0.9904306220095693,
51
+ "test_doclevel-social group_precision": 0.9384825700615175,
52
+ "test_doclevel-social group_recall": 0.9384825700615175,
53
+ "test_doclevel-social group_f1": 0.9384825700615175,
54
+ "test_doclevel-political institution_precision": 0.9542036910457963,
55
+ "test_doclevel-political institution_recall": 0.9542036910457963,
56
+ "test_doclevel-political institution_f1": 0.9542036910457963,
57
+ "test_doclevel-organization, public institution, or collective actor_precision": 0.9309637730690362,
58
+ "test_doclevel-organization, public institution, or collective actor_recall": 0.9309637730690362,
59
+ "test_doclevel-organization, public institution, or collective actor_f1": 0.9309637730690362,
60
+ "test_doclevel-implicit social group reference_precision": 0.9514695830485305,
61
+ "test_doclevel-implicit social group reference_recall": 0.9514695830485305,
62
+ "test_doclevel-implicit social group reference_f1": 0.9514695830485305,
63
+ "test_wordlevel-accuracy": 0.9597014436769569,
64
+ "test_wordlevel-macro_f1": 0.8425886013010025,
65
+ "test_wordlevel-macro_precision": 0.8545974680596337,
66
+ "test_wordlevel-macro_recall": 0.8317947605945467,
67
+ "test_wordlevel-O_f1": 0.9801661298525174,
68
+ "test_wordlevel-O_precision": 0.9767258530725628,
69
+ "test_wordlevel-O_recall": 0.9836307273552094,
70
+ "test_wordlevel-political group_f1": 0.9549180327868853,
71
+ "test_wordlevel-political group_precision": 0.9529652351738241,
72
+ "test_wordlevel-political group_recall": 0.9568788501026694,
73
+ "test_wordlevel-social group_f1": 0.8473444613050076,
74
+ "test_wordlevel-social group_precision": 0.8735919899874843,
75
+ "test_wordlevel-social group_recall": 0.8226281673541543,
76
+ "test_wordlevel-political institution_f1": 0.8114478114478114,
77
+ "test_wordlevel-political institution_precision": 0.8596908442330559,
78
+ "test_wordlevel-political institution_recall": 0.7683315621679064,
79
+ "test_wordlevel-organization, public institution, or collective actor_f1": 0.7296551724137931,
80
+ "test_wordlevel-organization, public institution, or collective actor_precision": 0.720708446866485,
81
+ "test_wordlevel-organization, public institution, or collective actor_recall": 0.7388268156424581,
82
+ "test_wordlevel-implicit social group reference_f1": 0.732,
83
+ "test_wordlevel-implicit social group reference_precision": 0.7439024390243902,
84
+ "test_wordlevel-implicit social group reference_recall": 0.7204724409448819,
85
+ "test_runtime": 7.7412,
86
+ "test_samples_per_second": 188.989,
87
+ "test_steps_per_second": 11.884,
88
+ "epoch": 2.0
89
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,946 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "|||IP_ADDRESS|||",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": false
11
+ },
12
+ "1": {
13
+ "content": "<|padding|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "50254": {
21
+ "content": " ",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": false
27
+ },
28
+ "50255": {
29
+ "content": " ",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": false
35
+ },
36
+ "50256": {
37
+ "content": " ",
38
+ "lstrip": false,
39
+ "normalized": true,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": false
43
+ },
44
+ "50257": {
45
+ "content": " ",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": false
51
+ },
52
+ "50258": {
53
+ "content": " ",
54
+ "lstrip": false,
55
+ "normalized": true,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": false
59
+ },
60
+ "50259": {
61
+ "content": " ",
62
+ "lstrip": false,
63
+ "normalized": true,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": false
67
+ },
68
+ "50260": {
69
+ "content": " ",
70
+ "lstrip": false,
71
+ "normalized": true,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": false
75
+ },
76
+ "50261": {
77
+ "content": " ",
78
+ "lstrip": false,
79
+ "normalized": true,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": false
83
+ },
84
+ "50262": {
85
+ "content": " ",
86
+ "lstrip": false,
87
+ "normalized": true,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": false
91
+ },
92
+ "50263": {
93
+ "content": " ",
94
+ "lstrip": false,
95
+ "normalized": true,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": false
99
+ },
100
+ "50264": {
101
+ "content": " ",
102
+ "lstrip": false,
103
+ "normalized": true,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": false
107
+ },
108
+ "50265": {
109
+ "content": " ",
110
+ "lstrip": false,
111
+ "normalized": true,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": false
115
+ },
116
+ "50266": {
117
+ "content": " ",
118
+ "lstrip": false,
119
+ "normalized": true,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": false
123
+ },
124
+ "50267": {
125
+ "content": " ",
126
+ "lstrip": false,
127
+ "normalized": true,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": false
131
+ },
132
+ "50268": {
133
+ "content": " ",
134
+ "lstrip": false,
135
+ "normalized": true,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": false
139
+ },
140
+ "50269": {
141
+ "content": " ",
142
+ "lstrip": false,
143
+ "normalized": true,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": false
147
+ },
148
+ "50270": {
149
+ "content": " ",
150
+ "lstrip": false,
151
+ "normalized": true,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": false
155
+ },
156
+ "50271": {
157
+ "content": " ",
158
+ "lstrip": false,
159
+ "normalized": true,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": false
163
+ },
164
+ "50272": {
165
+ "content": " ",
166
+ "lstrip": false,
167
+ "normalized": true,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": false
171
+ },
172
+ "50273": {
173
+ "content": " ",
174
+ "lstrip": false,
175
+ "normalized": true,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": false
179
+ },
180
+ "50274": {
181
+ "content": " ",
182
+ "lstrip": false,
183
+ "normalized": true,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": false
187
+ },
188
+ "50275": {
189
+ "content": " ",
190
+ "lstrip": false,
191
+ "normalized": true,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": false
195
+ },
196
+ "50276": {
197
+ "content": " ",
198
+ "lstrip": false,
199
+ "normalized": true,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": false
203
+ },
204
+ "50277": {
205
+ "content": "|||EMAIL_ADDRESS|||",
206
+ "lstrip": false,
207
+ "normalized": true,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": false
211
+ },
212
+ "50278": {
213
+ "content": "|||PHONE_NUMBER|||",
214
+ "lstrip": false,
215
+ "normalized": true,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": false
219
+ },
220
+ "50279": {
221
+ "content": "<|endoftext|>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "50280": {
229
+ "content": "[UNK]",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "50281": {
237
+ "content": "[CLS]",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "50282": {
245
+ "content": "[SEP]",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "50283": {
253
+ "content": "[PAD]",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "50284": {
261
+ "content": "[MASK]",
262
+ "lstrip": true,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "50285": {
269
+ "content": "[unused0]",
270
+ "lstrip": false,
271
+ "normalized": true,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": false
275
+ },
276
+ "50286": {
277
+ "content": "[unused1]",
278
+ "lstrip": false,
279
+ "normalized": true,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": false
283
+ },
284
+ "50287": {
285
+ "content": "[unused2]",
286
+ "lstrip": false,
287
+ "normalized": true,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": false
291
+ },
292
+ "50288": {
293
+ "content": "[unused3]",
294
+ "lstrip": false,
295
+ "normalized": true,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": false
299
+ },
300
+ "50289": {
301
+ "content": "[unused4]",
302
+ "lstrip": false,
303
+ "normalized": true,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": false
307
+ },
308
+ "50290": {
309
+ "content": "[unused5]",
310
+ "lstrip": false,
311
+ "normalized": true,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": false
315
+ },
316
+ "50291": {
317
+ "content": "[unused6]",
318
+ "lstrip": false,
319
+ "normalized": true,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": false
323
+ },
324
+ "50292": {
325
+ "content": "[unused7]",
326
+ "lstrip": false,
327
+ "normalized": true,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": false
331
+ },
332
+ "50293": {
333
+ "content": "[unused8]",
334
+ "lstrip": false,
335
+ "normalized": true,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": false
339
+ },
340
+ "50294": {
341
+ "content": "[unused9]",
342
+ "lstrip": false,
343
+ "normalized": true,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": false
347
+ },
348
+ "50295": {
349
+ "content": "[unused10]",
350
+ "lstrip": false,
351
+ "normalized": true,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": false
355
+ },
356
+ "50296": {
357
+ "content": "[unused11]",
358
+ "lstrip": false,
359
+ "normalized": true,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": false
363
+ },
364
+ "50297": {
365
+ "content": "[unused12]",
366
+ "lstrip": false,
367
+ "normalized": true,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": false
371
+ },
372
+ "50298": {
373
+ "content": "[unused13]",
374
+ "lstrip": false,
375
+ "normalized": true,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": false
379
+ },
380
+ "50299": {
381
+ "content": "[unused14]",
382
+ "lstrip": false,
383
+ "normalized": true,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": false
387
+ },
388
+ "50300": {
389
+ "content": "[unused15]",
390
+ "lstrip": false,
391
+ "normalized": true,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": false
395
+ },
396
+ "50301": {
397
+ "content": "[unused16]",
398
+ "lstrip": false,
399
+ "normalized": true,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": false
403
+ },
404
+ "50302": {
405
+ "content": "[unused17]",
406
+ "lstrip": false,
407
+ "normalized": true,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": false
411
+ },
412
+ "50303": {
413
+ "content": "[unused18]",
414
+ "lstrip": false,
415
+ "normalized": true,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": false
419
+ },
420
+ "50304": {
421
+ "content": "[unused19]",
422
+ "lstrip": false,
423
+ "normalized": true,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": false
427
+ },
428
+ "50305": {
429
+ "content": "[unused20]",
430
+ "lstrip": false,
431
+ "normalized": true,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": false
435
+ },
436
+ "50306": {
437
+ "content": "[unused21]",
438
+ "lstrip": false,
439
+ "normalized": true,
440
+ "rstrip": false,
441
+ "single_word": false,
442
+ "special": false
443
+ },
444
+ "50307": {
445
+ "content": "[unused22]",
446
+ "lstrip": false,
447
+ "normalized": true,
448
+ "rstrip": false,
449
+ "single_word": false,
450
+ "special": false
451
+ },
452
+ "50308": {
453
+ "content": "[unused23]",
454
+ "lstrip": false,
455
+ "normalized": true,
456
+ "rstrip": false,
457
+ "single_word": false,
458
+ "special": false
459
+ },
460
+ "50309": {
461
+ "content": "[unused24]",
462
+ "lstrip": false,
463
+ "normalized": true,
464
+ "rstrip": false,
465
+ "single_word": false,
466
+ "special": false
467
+ },
468
+ "50310": {
469
+ "content": "[unused25]",
470
+ "lstrip": false,
471
+ "normalized": true,
472
+ "rstrip": false,
473
+ "single_word": false,
474
+ "special": false
475
+ },
476
+ "50311": {
477
+ "content": "[unused26]",
478
+ "lstrip": false,
479
+ "normalized": true,
480
+ "rstrip": false,
481
+ "single_word": false,
482
+ "special": false
483
+ },
484
+ "50312": {
485
+ "content": "[unused27]",
486
+ "lstrip": false,
487
+ "normalized": true,
488
+ "rstrip": false,
489
+ "single_word": false,
490
+ "special": false
491
+ },
492
+ "50313": {
493
+ "content": "[unused28]",
494
+ "lstrip": false,
495
+ "normalized": true,
496
+ "rstrip": false,
497
+ "single_word": false,
498
+ "special": false
499
+ },
500
+ "50314": {
501
+ "content": "[unused29]",
502
+ "lstrip": false,
503
+ "normalized": true,
504
+ "rstrip": false,
505
+ "single_word": false,
506
+ "special": false
507
+ },
508
+ "50315": {
509
+ "content": "[unused30]",
510
+ "lstrip": false,
511
+ "normalized": true,
512
+ "rstrip": false,
513
+ "single_word": false,
514
+ "special": false
515
+ },
516
+ "50316": {
517
+ "content": "[unused31]",
518
+ "lstrip": false,
519
+ "normalized": true,
520
+ "rstrip": false,
521
+ "single_word": false,
522
+ "special": false
523
+ },
524
+ "50317": {
525
+ "content": "[unused32]",
526
+ "lstrip": false,
527
+ "normalized": true,
528
+ "rstrip": false,
529
+ "single_word": false,
530
+ "special": false
531
+ },
532
+ "50318": {
533
+ "content": "[unused33]",
534
+ "lstrip": false,
535
+ "normalized": true,
536
+ "rstrip": false,
537
+ "single_word": false,
538
+ "special": false
539
+ },
540
+ "50319": {
541
+ "content": "[unused34]",
542
+ "lstrip": false,
543
+ "normalized": true,
544
+ "rstrip": false,
545
+ "single_word": false,
546
+ "special": false
547
+ },
548
+ "50320": {
549
+ "content": "[unused35]",
550
+ "lstrip": false,
551
+ "normalized": true,
552
+ "rstrip": false,
553
+ "single_word": false,
554
+ "special": false
555
+ },
556
+ "50321": {
557
+ "content": "[unused36]",
558
+ "lstrip": false,
559
+ "normalized": true,
560
+ "rstrip": false,
561
+ "single_word": false,
562
+ "special": false
563
+ },
564
+ "50322": {
565
+ "content": "[unused37]",
566
+ "lstrip": false,
567
+ "normalized": true,
568
+ "rstrip": false,
569
+ "single_word": false,
570
+ "special": false
571
+ },
572
+ "50323": {
573
+ "content": "[unused38]",
574
+ "lstrip": false,
575
+ "normalized": true,
576
+ "rstrip": false,
577
+ "single_word": false,
578
+ "special": false
579
+ },
580
+ "50324": {
581
+ "content": "[unused39]",
582
+ "lstrip": false,
583
+ "normalized": true,
584
+ "rstrip": false,
585
+ "single_word": false,
586
+ "special": false
587
+ },
588
+ "50325": {
589
+ "content": "[unused40]",
590
+ "lstrip": false,
591
+ "normalized": true,
592
+ "rstrip": false,
593
+ "single_word": false,
594
+ "special": false
595
+ },
596
+ "50326": {
597
+ "content": "[unused41]",
598
+ "lstrip": false,
599
+ "normalized": true,
600
+ "rstrip": false,
601
+ "single_word": false,
602
+ "special": false
603
+ },
604
+ "50327": {
605
+ "content": "[unused42]",
606
+ "lstrip": false,
607
+ "normalized": true,
608
+ "rstrip": false,
609
+ "single_word": false,
610
+ "special": false
611
+ },
612
+ "50328": {
613
+ "content": "[unused43]",
614
+ "lstrip": false,
615
+ "normalized": true,
616
+ "rstrip": false,
617
+ "single_word": false,
618
+ "special": false
619
+ },
620
+ "50329": {
621
+ "content": "[unused44]",
622
+ "lstrip": false,
623
+ "normalized": true,
624
+ "rstrip": false,
625
+ "single_word": false,
626
+ "special": false
627
+ },
628
+ "50330": {
629
+ "content": "[unused45]",
630
+ "lstrip": false,
631
+ "normalized": true,
632
+ "rstrip": false,
633
+ "single_word": false,
634
+ "special": false
635
+ },
636
+ "50331": {
637
+ "content": "[unused46]",
638
+ "lstrip": false,
639
+ "normalized": true,
640
+ "rstrip": false,
641
+ "single_word": false,
642
+ "special": false
643
+ },
644
+ "50332": {
645
+ "content": "[unused47]",
646
+ "lstrip": false,
647
+ "normalized": true,
648
+ "rstrip": false,
649
+ "single_word": false,
650
+ "special": false
651
+ },
652
+ "50333": {
653
+ "content": "[unused48]",
654
+ "lstrip": false,
655
+ "normalized": true,
656
+ "rstrip": false,
657
+ "single_word": false,
658
+ "special": false
659
+ },
660
+ "50334": {
661
+ "content": "[unused49]",
662
+ "lstrip": false,
663
+ "normalized": true,
664
+ "rstrip": false,
665
+ "single_word": false,
666
+ "special": false
667
+ },
668
+ "50335": {
669
+ "content": "[unused50]",
670
+ "lstrip": false,
671
+ "normalized": true,
672
+ "rstrip": false,
673
+ "single_word": false,
674
+ "special": false
675
+ },
676
+ "50336": {
677
+ "content": "[unused51]",
678
+ "lstrip": false,
679
+ "normalized": true,
680
+ "rstrip": false,
681
+ "single_word": false,
682
+ "special": false
683
+ },
684
+ "50337": {
685
+ "content": "[unused52]",
686
+ "lstrip": false,
687
+ "normalized": true,
688
+ "rstrip": false,
689
+ "single_word": false,
690
+ "special": false
691
+ },
692
+ "50338": {
693
+ "content": "[unused53]",
694
+ "lstrip": false,
695
+ "normalized": true,
696
+ "rstrip": false,
697
+ "single_word": false,
698
+ "special": false
699
+ },
700
+ "50339": {
701
+ "content": "[unused54]",
702
+ "lstrip": false,
703
+ "normalized": true,
704
+ "rstrip": false,
705
+ "single_word": false,
706
+ "special": false
707
+ },
708
+ "50340": {
709
+ "content": "[unused55]",
710
+ "lstrip": false,
711
+ "normalized": true,
712
+ "rstrip": false,
713
+ "single_word": false,
714
+ "special": false
715
+ },
716
+ "50341": {
717
+ "content": "[unused56]",
718
+ "lstrip": false,
719
+ "normalized": true,
720
+ "rstrip": false,
721
+ "single_word": false,
722
+ "special": false
723
+ },
724
+ "50342": {
725
+ "content": "[unused57]",
726
+ "lstrip": false,
727
+ "normalized": true,
728
+ "rstrip": false,
729
+ "single_word": false,
730
+ "special": false
731
+ },
732
+ "50343": {
733
+ "content": "[unused58]",
734
+ "lstrip": false,
735
+ "normalized": true,
736
+ "rstrip": false,
737
+ "single_word": false,
738
+ "special": false
739
+ },
740
+ "50344": {
741
+ "content": "[unused59]",
742
+ "lstrip": false,
743
+ "normalized": true,
744
+ "rstrip": false,
745
+ "single_word": false,
746
+ "special": false
747
+ },
748
+ "50345": {
749
+ "content": "[unused60]",
750
+ "lstrip": false,
751
+ "normalized": true,
752
+ "rstrip": false,
753
+ "single_word": false,
754
+ "special": false
755
+ },
756
+ "50346": {
757
+ "content": "[unused61]",
758
+ "lstrip": false,
759
+ "normalized": true,
760
+ "rstrip": false,
761
+ "single_word": false,
762
+ "special": false
763
+ },
764
+ "50347": {
765
+ "content": "[unused62]",
766
+ "lstrip": false,
767
+ "normalized": true,
768
+ "rstrip": false,
769
+ "single_word": false,
770
+ "special": false
771
+ },
772
+ "50348": {
773
+ "content": "[unused63]",
774
+ "lstrip": false,
775
+ "normalized": true,
776
+ "rstrip": false,
777
+ "single_word": false,
778
+ "special": false
779
+ },
780
+ "50349": {
781
+ "content": "[unused64]",
782
+ "lstrip": false,
783
+ "normalized": true,
784
+ "rstrip": false,
785
+ "single_word": false,
786
+ "special": false
787
+ },
788
+ "50350": {
789
+ "content": "[unused65]",
790
+ "lstrip": false,
791
+ "normalized": true,
792
+ "rstrip": false,
793
+ "single_word": false,
794
+ "special": false
795
+ },
796
+ "50351": {
797
+ "content": "[unused66]",
798
+ "lstrip": false,
799
+ "normalized": true,
800
+ "rstrip": false,
801
+ "single_word": false,
802
+ "special": false
803
+ },
804
+ "50352": {
805
+ "content": "[unused67]",
806
+ "lstrip": false,
807
+ "normalized": true,
808
+ "rstrip": false,
809
+ "single_word": false,
810
+ "special": false
811
+ },
812
+ "50353": {
813
+ "content": "[unused68]",
814
+ "lstrip": false,
815
+ "normalized": true,
816
+ "rstrip": false,
817
+ "single_word": false,
818
+ "special": false
819
+ },
820
+ "50354": {
821
+ "content": "[unused69]",
822
+ "lstrip": false,
823
+ "normalized": true,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": false
827
+ },
828
+ "50355": {
829
+ "content": "[unused70]",
830
+ "lstrip": false,
831
+ "normalized": true,
832
+ "rstrip": false,
833
+ "single_word": false,
834
+ "special": false
835
+ },
836
+ "50356": {
837
+ "content": "[unused71]",
838
+ "lstrip": false,
839
+ "normalized": true,
840
+ "rstrip": false,
841
+ "single_word": false,
842
+ "special": false
843
+ },
844
+ "50357": {
845
+ "content": "[unused72]",
846
+ "lstrip": false,
847
+ "normalized": true,
848
+ "rstrip": false,
849
+ "single_word": false,
850
+ "special": false
851
+ },
852
+ "50358": {
853
+ "content": "[unused73]",
854
+ "lstrip": false,
855
+ "normalized": true,
856
+ "rstrip": false,
857
+ "single_word": false,
858
+ "special": false
859
+ },
860
+ "50359": {
861
+ "content": "[unused74]",
862
+ "lstrip": false,
863
+ "normalized": true,
864
+ "rstrip": false,
865
+ "single_word": false,
866
+ "special": false
867
+ },
868
+ "50360": {
869
+ "content": "[unused75]",
870
+ "lstrip": false,
871
+ "normalized": true,
872
+ "rstrip": false,
873
+ "single_word": false,
874
+ "special": false
875
+ },
876
+ "50361": {
877
+ "content": "[unused76]",
878
+ "lstrip": false,
879
+ "normalized": true,
880
+ "rstrip": false,
881
+ "single_word": false,
882
+ "special": false
883
+ },
884
+ "50362": {
885
+ "content": "[unused77]",
886
+ "lstrip": false,
887
+ "normalized": true,
888
+ "rstrip": false,
889
+ "single_word": false,
890
+ "special": false
891
+ },
892
+ "50363": {
893
+ "content": "[unused78]",
894
+ "lstrip": false,
895
+ "normalized": true,
896
+ "rstrip": false,
897
+ "single_word": false,
898
+ "special": false
899
+ },
900
+ "50364": {
901
+ "content": "[unused79]",
902
+ "lstrip": false,
903
+ "normalized": true,
904
+ "rstrip": false,
905
+ "single_word": false,
906
+ "special": false
907
+ },
908
+ "50365": {
909
+ "content": "[unused80]",
910
+ "lstrip": false,
911
+ "normalized": true,
912
+ "rstrip": false,
913
+ "single_word": false,
914
+ "special": false
915
+ },
916
+ "50366": {
917
+ "content": "[unused81]",
918
+ "lstrip": false,
919
+ "normalized": true,
920
+ "rstrip": false,
921
+ "single_word": false,
922
+ "special": false
923
+ },
924
+ "50367": {
925
+ "content": "[unused82]",
926
+ "lstrip": false,
927
+ "normalized": true,
928
+ "rstrip": false,
929
+ "single_word": false,
930
+ "special": false
931
+ }
932
+ },
933
+ "clean_up_tokenization_spaces": true,
934
+ "cls_token": "[CLS]",
935
+ "extra_special_tokens": {},
936
+ "mask_token": "[MASK]",
937
+ "model_input_names": [
938
+ "input_ids",
939
+ "attention_mask"
940
+ ],
941
+ "model_max_length": 8192,
942
+ "pad_token": "[PAD]",
943
+ "sep_token": "[SEP]",
944
+ "tokenizer_class": "PreTrainedTokenizer",
945
+ "unk_token": "[UNK]"
946
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8aec0a293ce0e67b9b9323cbef041f5d8680540341d63a7c24bbfd51d2f79fa
3
+ size 5841