jakelever commited on
Commit
c2e7c4a
·
verified ·
1 Parent(s): 99ae729

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ task: token-classification
3
+ tags:
4
+ - biomedical
5
+ - bionlp
6
+ license: mit
7
+ base_model: microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
8
+ ---
9
+
10
+ # bioner_medmentions_st21pv
11
+
12
+ This is a named entity recognition model fine-tuned from the [microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) model. It predicts spans with 14 possible labels. The labels are **Anatomy, Chemicals & Drugs, Concepts & Ideas, Devices, Disorders, Genes & Molecular Sequences, Geographic Areas, Living Beings, Objects, Occupations, Organizations, Phenomena, Physiology and Procedures**.
13
+
14
+ The code used for training this model can be found at https://github.com/Glasgow-AI4BioMed/bioner along with links to other biomedical NER models trained on well-known biomedical corpora. The source dataset information is below.
15
+
16
+ ## Example Usage
17
+
18
+ The code below will load up the model and apply it to the provided text. It uses a simple aggregation strategy to post-process the individual tokens into larger multi-token entities where needed.
19
+
20
+ ```python
21
+ from transformers import pipeline
22
+
23
+ # Load the model as part of an NER pipeline
24
+ ner_pipeline = pipeline("token-classification",
25
+ model="Glasgow-AI4BioMed/bioner_medmentions_st21pv",
26
+ aggregation_strategy="max")
27
+
28
+ # Apply it to some text
29
+ ner_pipeline("EGFR T790M mutations have been known to affect treatment outcomes for NSCLC patients receiving erlotinib.")
30
+
31
+ # Output:
32
+ # [ {"entity_group": "Disorders", "score": 0.62466, "word": "egfr t790m mutations", "start": 0, "end": 20},
33
+ # {"entity_group": "Disorders", "score": 0.98835, "word": "nsclc", "start": 51, "end": 56},
34
+ # {"entity_group": "Chemicals & Drugs", "score": 0.97885, "word": "erlotinib", "start": 76, "end": 85} ]
35
+ ```
36
+
37
+ ## Dataset Info
38
+
39
+ **Source:** The ST21pv version of MedMentions was downloaded from: https://github.com/chanzuckerberg/MedMentions/tree/master/st21pv
40
+
41
+ The dataset should be cited with: Mohan, Sunil, and Donghui Li. "MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts." Automated Knowledge Base Construction (AKBC), 2019, https://openreview.net/forum?id=SylxCx5pTQ. DOI: [10.24432/C5G59C](https://doi.org/10.24432/C5G59C)
42
+
43
+ An overview of semantic types can be found at: https://www.nlm.nih.gov/research/umls/META3_current_semantic_types.html
44
+
45
+ **Preprocessing:** The training, validation and test splits were maintained from the original dataset. Concept identifiers (CUIs) were used to map each annotation to its associated UMLS entry to recover semantic types (from the MRSTY.RRF UMLS file). Semantic types provided in MedMentions were not used. Annotations were mapped to specific *semantic groups* names using the Semantic Groups file available at: https://www.nlm.nih.gov/research/umls/knowledge_sources/semantic_network/index.html. This contrasts with the finegrained version that mapped annotations to *semantic types*. The preprocessing script for this dataset is [prepare_medmentions.py](https://github.com/Glasgow-AI4BioMed/bioner/blob/main/prepare_medmentions.py.py) without the --finegrain flag.
46
+
47
+ ## Performance
48
+
49
+ The span-level performance on the test split for the different labels are shown in the tables below. The full performance results are available in the model repo in Markdown format for viewing and JSON format for easier loading. These include the performance at token level (with individual B- and I- labels as the token classifier uses IOB2 token labelling).
50
+
51
+ | Label | Precision | Recall | F1-score | Support |
52
+ | --- | --- | --- | --- | --- |
53
+ | Anatomy | 0.656 | 0.672 | 0.664 | 3277 |
54
+ | Chemicals & Drugs | 0.748 | 0.745 | 0.747 | 7398 |
55
+ | Concepts & Ideas | 0.515 | 0.370 | 0.430 | 3683 |
56
+ | Devices | 0.447 | 0.372 | 0.406 | 355 |
57
+ | Disorders | 0.691 | 0.641 | 0.665 | 8109 |
58
+ | Genes & Molecular Sequences | 0.506 | 0.567 | 0.535 | 1115 |
59
+ | Geographic Areas | 0.671 | 0.737 | 0.703 | 598 |
60
+ | Living Beings | 0.718 | 0.739 | 0.728 | 3994 |
61
+ | Objects | 0.518 | 0.598 | 0.555 | 336 |
62
+ | Occupations | 0.367 | 0.480 | 0.416 | 196 |
63
+ | Organizations | 0.504 | 0.634 | 0.561 | 382 |
64
+ | Phenomena | 0.206 | 0.271 | 0.234 | 269 |
65
+ | Physiology | 0.560 | 0.582 | 0.571 | 3833 |
66
+ | Procedures | 0.597 | 0.607 | 0.602 | 6599 |
67
+ | macro avg | 0.550 | 0.573 | 0.558 | 40144 |
68
+ | weighted avg | 0.641 | 0.630 | 0.634 | 40144 |
69
+
70
+
71
+ ## Hyperparameters
72
+
73
+ Hyperparameter tuning was done with [optuna](https://optuna.org/) and the [hyperparameter_search](https://huggingface.co/docs/transformers/en/hpo_train) functionality. 100 trials were run. Early stopping was applied during training. The best performing model was selected using the macro F1 performance on the validation set. The selected hyperparameters are in the table below.
74
+
75
+ | Hyperparameter | Value |
76
+ |----------------|-------|
77
+ | epochs | 4.0 |
78
+ | learning_rate | 9.767344966191627e-05 |
79
+ | per_device_train_batch_size | 16 |
80
+ | weight_decay | 0.025286446963170207 |
81
+ | warmup_ratio | 0.021367464793327073 |
82
+
best_hyperparameters.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epochs": 4.0,
3
+ "learning_rate": 9.767344966191627e-05,
4
+ "per_device_train_batch_size": 16,
5
+ "weight_decay": 0.025286446963170207,
6
+ "warmup_ratio": 0.021367464793327073
7
+ }
config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext",
3
+ "architectures": [
4
+ "BertForTokenClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "O",
13
+ "1": "B-Anatomy",
14
+ "2": "I-Anatomy",
15
+ "3": "B-Chemicals & Drugs",
16
+ "4": "I-Chemicals & Drugs",
17
+ "5": "B-Concepts & Ideas",
18
+ "6": "I-Concepts & Ideas",
19
+ "7": "B-Devices",
20
+ "8": "I-Devices",
21
+ "9": "B-Disorders",
22
+ "10": "I-Disorders",
23
+ "11": "B-Genes & Molecular Sequences",
24
+ "12": "I-Genes & Molecular Sequences",
25
+ "13": "B-Geographic Areas",
26
+ "14": "I-Geographic Areas",
27
+ "15": "B-Living Beings",
28
+ "16": "I-Living Beings",
29
+ "17": "B-Objects",
30
+ "18": "I-Objects",
31
+ "19": "B-Occupations",
32
+ "20": "I-Occupations",
33
+ "21": "B-Organizations",
34
+ "22": "I-Organizations",
35
+ "23": "B-Phenomena",
36
+ "24": "I-Phenomena",
37
+ "25": "B-Physiology",
38
+ "26": "I-Physiology",
39
+ "27": "B-Procedures",
40
+ "28": "I-Procedures"
41
+ },
42
+ "initializer_range": 0.02,
43
+ "intermediate_size": 3072,
44
+ "layer_norm_eps": 1e-12,
45
+ "max_position_embeddings": 512,
46
+ "model_type": "bert",
47
+ "num_attention_heads": 12,
48
+ "num_hidden_layers": 12,
49
+ "pad_token_id": 0,
50
+ "position_embedding_type": "absolute",
51
+ "torch_dtype": "float32",
52
+ "transformers_version": "4.48.1",
53
+ "type_vocab_size": 2,
54
+ "use_cache": true,
55
+ "vocab_size": 30522
56
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f089f7dbe5fce6283f669a8ca9cb4ed37d330acc16a1cc129b4086edbc54404e
3
+ size 435679140
performance_report.json ADDED
@@ -0,0 +1,869 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "train": {
3
+ "token_level": {
4
+ "O": {
5
+ "precision": 0.9720447712640268,
6
+ "recall": 0.976257153730711,
7
+ "f1-score": 0.9741464087457614,
8
+ "support": 557052.0
9
+ },
10
+ "B-Anatomy": {
11
+ "precision": 0.919032967032967,
12
+ "recall": 0.9125349162011173,
13
+ "f1-score": 0.9157724146993124,
14
+ "support": 11456.0
15
+ },
16
+ "I-Anatomy": {
17
+ "precision": 0.9173255246461689,
18
+ "recall": 0.9449024733561231,
19
+ "f1-score": 0.9309098113020653,
20
+ "support": 9946.0
21
+ },
22
+ "B-Chemicals & Drugs": {
23
+ "precision": 0.9425027164070989,
24
+ "recall": 0.9337938458778147,
25
+ "f1-score": 0.9381280699382633,
26
+ "support": 22294.0
27
+ },
28
+ "I-Chemicals & Drugs": {
29
+ "precision": 0.9631471190406552,
30
+ "recall": 0.9549233148601624,
31
+ "f1-score": 0.9590175870047082,
32
+ "support": 31036.0
33
+ },
34
+ "B-Concepts & Ideas": {
35
+ "precision": 0.8773684210526316,
36
+ "recall": 0.6520633678857813,
37
+ "f1-score": 0.74812072254011,
38
+ "support": 10226.0
39
+ },
40
+ "I-Concepts & Ideas": {
41
+ "precision": 0.8836440877979702,
42
+ "recall": 0.8557714285714285,
43
+ "f1-score": 0.8694844403158384,
44
+ "support": 8750.0
45
+ },
46
+ "B-Devices": {
47
+ "precision": 0.8684701492537313,
48
+ "recall": 0.8005159071367154,
49
+ "f1-score": 0.8331096196868009,
50
+ "support": 1163.0
51
+ },
52
+ "I-Devices": {
53
+ "precision": 0.8502097064110246,
54
+ "recall": 0.8857677902621723,
55
+ "f1-score": 0.8676245796392541,
56
+ "support": 1602.0
57
+ },
58
+ "B-Disorders": {
59
+ "precision": 0.9004460226042523,
60
+ "recall": 0.8502964628910243,
61
+ "f1-score": 0.8746529822495163,
62
+ "support": 24455.0
63
+ },
64
+ "I-Disorders": {
65
+ "precision": 0.8971253717191743,
66
+ "recall": 0.9068571926461619,
67
+ "f1-score": 0.9019650323894534,
68
+ "support": 22954.0
69
+ },
70
+ "B-Genes & Molecular Sequences": {
71
+ "precision": 0.8677474402730375,
72
+ "recall": 0.9045360213459828,
73
+ "f1-score": 0.8857599070982726,
74
+ "support": 3373.0
75
+ },
76
+ "I-Genes & Molecular Sequences": {
77
+ "precision": 0.8909386869234943,
78
+ "recall": 0.9354348651728067,
79
+ "f1-score": 0.9126447429365447,
80
+ "support": 5266.0
81
+ },
82
+ "B-Geographic Areas": {
83
+ "precision": 0.9166666666666666,
84
+ "recall": 0.9500587544065805,
85
+ "f1-score": 0.933064050778996,
86
+ "support": 1702.0
87
+ },
88
+ "I-Geographic Areas": {
89
+ "precision": 0.9370424597364568,
90
+ "recall": 0.9343065693430657,
91
+ "f1-score": 0.935672514619883,
92
+ "support": 1370.0
93
+ },
94
+ "B-Living Beings": {
95
+ "precision": 0.9251754385964912,
96
+ "recall": 0.9346034559149313,
97
+ "f1-score": 0.9298655499228565,
98
+ "support": 11285.0
99
+ },
100
+ "I-Living Beings": {
101
+ "precision": 0.9578313253012049,
102
+ "recall": 0.9625474210993623,
103
+ "f1-score": 0.9601835822698176,
104
+ "support": 12389.0
105
+ },
106
+ "B-Objects": {
107
+ "precision": 0.8566810344827587,
108
+ "recall": 0.9555288461538461,
109
+ "f1-score": 0.9034090909090909,
110
+ "support": 832.0
111
+ },
112
+ "I-Objects": {
113
+ "precision": 0.9041095890410958,
114
+ "recall": 0.9361702127659575,
115
+ "f1-score": 0.9198606271777003,
116
+ "support": 705.0
117
+ },
118
+ "B-Occupations": {
119
+ "precision": 0.6485148514851485,
120
+ "recall": 0.7586872586872587,
121
+ "f1-score": 0.699288256227758,
122
+ "support": 518.0
123
+ },
124
+ "I-Occupations": {
125
+ "precision": 0.6,
126
+ "recall": 0.7557251908396947,
127
+ "f1-score": 0.668918918918919,
128
+ "support": 393.0
129
+ },
130
+ "B-Organizations": {
131
+ "precision": 0.8365045806906272,
132
+ "recall": 0.9158950617283951,
133
+ "f1-score": 0.874401473296501,
134
+ "support": 1296.0
135
+ },
136
+ "I-Organizations": {
137
+ "precision": 0.8566362715298885,
138
+ "recall": 0.9575311438278595,
139
+ "f1-score": 0.9042780748663102,
140
+ "support": 1766.0
141
+ },
142
+ "B-Phenomena": {
143
+ "precision": 0.5407685098406748,
144
+ "recall": 0.5133451957295374,
145
+ "f1-score": 0.5267001369237791,
146
+ "support": 1124.0
147
+ },
148
+ "I-Phenomena": {
149
+ "precision": 0.5816203143893591,
150
+ "recall": 0.4863498483316481,
151
+ "f1-score": 0.5297356828193832,
152
+ "support": 989.0
153
+ },
154
+ "B-Physiology": {
155
+ "precision": 0.8493248045486852,
156
+ "recall": 0.8194753985942054,
157
+ "f1-score": 0.8341331471948347,
158
+ "support": 11666.0
159
+ },
160
+ "I-Physiology": {
161
+ "precision": 0.8103037213641546,
162
+ "recall": 0.8807887323943662,
163
+ "f1-score": 0.8440773134650685,
164
+ "support": 8875.0
165
+ },
166
+ "B-Procedures": {
167
+ "precision": 0.8294550810014728,
168
+ "recall": 0.8438717410848067,
169
+ "f1-score": 0.8366013071895425,
170
+ "support": 20022.0
171
+ },
172
+ "I-Procedures": {
173
+ "precision": 0.8911117462132038,
174
+ "recall": 0.8955909808990378,
175
+ "f1-score": 0.8933457488718574,
176
+ "support": 20889.0
177
+ },
178
+ "accuracy": 0.9494197870855755,
179
+ "macro avg": {
180
+ "precision": 0.8514396337694524,
181
+ "recall": 0.8625562259220192,
182
+ "f1-score": 0.8553404066895931,
183
+ "support": 805394.0
184
+ },
185
+ "weighted avg": {
186
+ "precision": 0.9492471775259178,
187
+ "recall": 0.9494197870855755,
188
+ "f1-score": 0.9490351220280436,
189
+ "support": 805394.0
190
+ }
191
+ },
192
+ "span_level": {
193
+ "Anatomy": {
194
+ "precision": 0.8700173310225303,
195
+ "recall": 0.872360761143453,
196
+ "f1-score": 0.8711874701722417,
197
+ "support": 11509
198
+ },
199
+ "Chemicals & Drugs": {
200
+ "precision": 0.9019501380028053,
201
+ "recall": 0.8886412268188303,
202
+ "f1-score": 0.8952462219028585,
203
+ "support": 22432
204
+ },
205
+ "Concepts & Ideas": {
206
+ "precision": 0.7898267870212247,
207
+ "recall": 0.630293001070768,
208
+ "f1-score": 0.7010990200855395,
209
+ "support": 10273
210
+ },
211
+ "Devices": {
212
+ "precision": 0.8045178105994787,
213
+ "recall": 0.7914529914529914,
214
+ "f1-score": 0.7979319258940112,
215
+ "support": 1170
216
+ },
217
+ "Disorders": {
218
+ "precision": 0.8474890461745871,
219
+ "recall": 0.8165949500690103,
220
+ "f1-score": 0.8317552201777961,
221
+ "support": 24634
222
+ },
223
+ "Genes & Molecular Sequences": {
224
+ "precision": 0.8086003372681282,
225
+ "recall": 0.8506800709639266,
226
+ "f1-score": 0.829106628242075,
227
+ "support": 3382
228
+ },
229
+ "Geographic Areas": {
230
+ "precision": 0.876410835214447,
231
+ "recall": 0.907126168224299,
232
+ "f1-score": 0.89150401836969,
233
+ "support": 1712
234
+ },
235
+ "Living Beings": {
236
+ "precision": 0.8884203127745564,
237
+ "recall": 0.8879522304179839,
238
+ "f1-score": 0.8881862099253405,
239
+ "support": 11388
240
+ },
241
+ "Objects": {
242
+ "precision": 0.8064171122994652,
243
+ "recall": 0.8860164512338425,
244
+ "f1-score": 0.8443449048152295,
245
+ "support": 851
246
+ },
247
+ "Occupations": {
248
+ "precision": 0.5763239875389408,
249
+ "recall": 0.7088122605363985,
250
+ "f1-score": 0.6357388316151202,
251
+ "support": 522
252
+ },
253
+ "Organizations": {
254
+ "precision": 0.7973811164713991,
255
+ "recall": 0.8845565749235474,
256
+ "f1-score": 0.8387096774193549,
257
+ "support": 1308
258
+ },
259
+ "Phenomena": {
260
+ "precision": 0.4477234401349072,
261
+ "recall": 0.46949602122015915,
262
+ "f1-score": 0.45835131635735865,
263
+ "support": 1131
264
+ },
265
+ "Physiology": {
266
+ "precision": 0.7840562521179262,
267
+ "recall": 0.7892717039058502,
268
+ "f1-score": 0.7866553336166596,
269
+ "support": 11726
270
+ },
271
+ "Procedures": {
272
+ "precision": 0.7731168893358233,
273
+ "recall": 0.801439563167039,
274
+ "f1-score": 0.7870234961489714,
275
+ "support": 20145
276
+ },
277
+ "macro avg": {
278
+ "precision": 0.78373224256973,
279
+ "recall": 0.7989067125105784,
280
+ "f1-score": 0.7897743053387318,
281
+ "support": 122183
282
+ },
283
+ "weighted avg": {
284
+ "precision": 0.8334626249065854,
285
+ "recall": 0.8204496533887693,
286
+ "f1-score": 0.8260050590679613,
287
+ "support": 122183
288
+ }
289
+ }
290
+ },
291
+ "val": {
292
+ "token_level": {
293
+ "O": {
294
+ "precision": 0.930847233100809,
295
+ "recall": 0.9466419326729286,
296
+ "f1-score": 0.9386781450775005,
297
+ "support": 187057.0
298
+ },
299
+ "B-Anatomy": {
300
+ "precision": 0.7610457516339869,
301
+ "recall": 0.7537545313309166,
302
+ "f1-score": 0.757382593989853,
303
+ "support": 3862.0
304
+ },
305
+ "I-Anatomy": {
306
+ "precision": 0.7202499289974439,
307
+ "recall": 0.7752980739834913,
308
+ "f1-score": 0.7467608951707891,
309
+ "support": 3271.0
310
+ },
311
+ "B-Chemicals & Drugs": {
312
+ "precision": 0.8210823909531503,
313
+ "recall": 0.823188014576866,
314
+ "f1-score": 0.8221338545528072,
315
+ "support": 7409.0
316
+ },
317
+ "I-Chemicals & Drugs": {
318
+ "precision": 0.8563915857605178,
319
+ "recall": 0.8465460361891433,
320
+ "f1-score": 0.8514403499069931,
321
+ "support": 10003.0
322
+ },
323
+ "B-Concepts & Ideas": {
324
+ "precision": 0.5897058823529412,
325
+ "recall": 0.3599640933572711,
326
+ "f1-score": 0.44704570791527315,
327
+ "support": 3342.0
328
+ },
329
+ "I-Concepts & Ideas": {
330
+ "precision": 0.5564516129032258,
331
+ "recall": 0.45887294364718234,
332
+ "f1-score": 0.5029733358910417,
333
+ "support": 2857.0
334
+ },
335
+ "B-Devices": {
336
+ "precision": 0.6964856230031949,
337
+ "recall": 0.44308943089430897,
338
+ "f1-score": 0.5416149068322982,
339
+ "support": 492.0
340
+ },
341
+ "I-Devices": {
342
+ "precision": 0.627677100494234,
343
+ "recall": 0.5537790697674418,
344
+ "f1-score": 0.5884169884169884,
345
+ "support": 688.0
346
+ },
347
+ "B-Disorders": {
348
+ "precision": 0.7699307347548554,
349
+ "recall": 0.6881524641903375,
350
+ "f1-score": 0.7267482853663226,
351
+ "support": 8238.0
352
+ },
353
+ "I-Disorders": {
354
+ "precision": 0.7285814116002796,
355
+ "recall": 0.7107990182710663,
356
+ "f1-score": 0.7195803713161709,
357
+ "support": 7334.0
358
+ },
359
+ "B-Genes & Molecular Sequences": {
360
+ "precision": 0.6427840327533265,
361
+ "recall": 0.6562173458725182,
362
+ "f1-score": 0.6494312306101344,
363
+ "support": 957.0
364
+ },
365
+ "I-Genes & Molecular Sequences": {
366
+ "precision": 0.6770573566084788,
367
+ "recall": 0.71026814911707,
368
+ "f1-score": 0.6932652409830833,
369
+ "support": 1529.0
370
+ },
371
+ "B-Geographic Areas": {
372
+ "precision": 0.7933042212518195,
373
+ "recall": 0.8086053412462908,
374
+ "f1-score": 0.8008817046289493,
375
+ "support": 674.0
376
+ },
377
+ "I-Geographic Areas": {
378
+ "precision": 0.7455197132616488,
379
+ "recall": 0.7675276752767528,
380
+ "f1-score": 0.7563636363636363,
381
+ "support": 542.0
382
+ },
383
+ "B-Living Beings": {
384
+ "precision": 0.79478672985782,
385
+ "recall": 0.8014336917562724,
386
+ "f1-score": 0.7980963712076146,
387
+ "support": 4185.0
388
+ },
389
+ "I-Living Beings": {
390
+ "precision": 0.8440483768300445,
391
+ "recall": 0.8202061855670103,
392
+ "f1-score": 0.8319564990065879,
393
+ "support": 4850.0
394
+ },
395
+ "B-Objects": {
396
+ "precision": 0.61,
397
+ "recall": 0.71484375,
398
+ "f1-score": 0.658273381294964,
399
+ "support": 256.0
400
+ },
401
+ "I-Objects": {
402
+ "precision": 0.6748971193415638,
403
+ "recall": 0.6721311475409836,
404
+ "f1-score": 0.6735112936344969,
405
+ "support": 244.0
406
+ },
407
+ "B-Occupations": {
408
+ "precision": 0.5208333333333334,
409
+ "recall": 0.5076142131979695,
410
+ "f1-score": 0.5141388174807198,
411
+ "support": 197.0
412
+ },
413
+ "I-Occupations": {
414
+ "precision": 0.47368421052631576,
415
+ "recall": 0.36416184971098264,
416
+ "f1-score": 0.4117647058823529,
417
+ "support": 173.0
418
+ },
419
+ "B-Organizations": {
420
+ "precision": 0.579476861167002,
421
+ "recall": 0.6501128668171557,
422
+ "f1-score": 0.6127659574468085,
423
+ "support": 443.0
424
+ },
425
+ "I-Organizations": {
426
+ "precision": 0.6396526772793053,
427
+ "recall": 0.7106109324758842,
428
+ "f1-score": 0.6732673267326733,
429
+ "support": 622.0
430
+ },
431
+ "B-Phenomena": {
432
+ "precision": 0.34057971014492755,
433
+ "recall": 0.34306569343065696,
434
+ "f1-score": 0.3418181818181818,
435
+ "support": 274.0
436
+ },
437
+ "I-Phenomena": {
438
+ "precision": 0.28451882845188287,
439
+ "recall": 0.32075471698113206,
440
+ "f1-score": 0.30155210643015523,
441
+ "support": 212.0
442
+ },
443
+ "B-Physiology": {
444
+ "precision": 0.6399317406143344,
445
+ "recall": 0.6038647342995169,
446
+ "f1-score": 0.6213753106876554,
447
+ "support": 3726.0
448
+ },
449
+ "I-Physiology": {
450
+ "precision": 0.5488801990757198,
451
+ "recall": 0.5744047619047619,
452
+ "f1-score": 0.5613524813670242,
453
+ "support": 2688.0
454
+ },
455
+ "B-Procedures": {
456
+ "precision": 0.6613690007867821,
457
+ "recall": 0.6512240471025721,
458
+ "f1-score": 0.6562573190725272,
459
+ "support": 6454.0
460
+ },
461
+ "I-Procedures": {
462
+ "precision": 0.6790007806401249,
463
+ "recall": 0.6622506471752703,
464
+ "f1-score": 0.6705211224175146,
465
+ "support": 6567.0
466
+ },
467
+ "accuracy": 0.8725375818329085,
468
+ "macro avg": {
469
+ "precision": 0.6623715223268644,
470
+ "recall": 0.6448063227018537,
471
+ "f1-score": 0.6506678662586591,
472
+ "support": 269146.0
473
+ },
474
+ "weighted avg": {
475
+ "precision": 0.8694071423336295,
476
+ "recall": 0.8725375818329085,
477
+ "f1-score": 0.8703500500944416,
478
+ "support": 269146.0
479
+ }
480
+ },
481
+ "span_level": {
482
+ "Anatomy": {
483
+ "precision": 0.6821275523065289,
484
+ "recall": 0.6972429786137594,
485
+ "f1-score": 0.6896024464831805,
486
+ "support": 3881
487
+ },
488
+ "Chemicals & Drugs": {
489
+ "precision": 0.7514910536779325,
490
+ "recall": 0.7570093457943925,
491
+ "f1-score": 0.7542401064183571,
492
+ "support": 7490
493
+ },
494
+ "Concepts & Ideas": {
495
+ "precision": 0.4873985476292183,
496
+ "recall": 0.33907875185735514,
497
+ "f1-score": 0.39992989835261133,
498
+ "support": 3365
499
+ },
500
+ "Devices": {
501
+ "precision": 0.6056338028169014,
502
+ "recall": 0.43610547667342797,
503
+ "f1-score": 0.5070754716981132,
504
+ "support": 493
505
+ },
506
+ "Disorders": {
507
+ "precision": 0.7004692387904067,
508
+ "recall": 0.6459134615384615,
509
+ "f1-score": 0.6720860430215108,
510
+ "support": 8320
511
+ },
512
+ "Genes & Molecular Sequences": {
513
+ "precision": 0.5463414634146342,
514
+ "recall": 0.5803108808290155,
515
+ "f1-score": 0.5628140703517589,
516
+ "support": 965
517
+ },
518
+ "Geographic Areas": {
519
+ "precision": 0.7474600870827286,
520
+ "recall": 0.7595870206489675,
521
+ "f1-score": 0.753474762253109,
522
+ "support": 678
523
+ },
524
+ "Living Beings": {
525
+ "precision": 0.7159887798036466,
526
+ "recall": 0.7244560075685903,
527
+ "f1-score": 0.7201975076416647,
528
+ "support": 4228
529
+ },
530
+ "Objects": {
531
+ "precision": 0.551948051948052,
532
+ "recall": 0.6563706563706564,
533
+ "f1-score": 0.599647266313933,
534
+ "support": 259
535
+ },
536
+ "Occupations": {
537
+ "precision": 0.46568627450980393,
538
+ "recall": 0.4797979797979798,
539
+ "f1-score": 0.472636815920398,
540
+ "support": 198
541
+ },
542
+ "Organizations": {
543
+ "precision": 0.5019157088122606,
544
+ "recall": 0.5783664459161147,
545
+ "f1-score": 0.5374358974358974,
546
+ "support": 453
547
+ },
548
+ "Phenomena": {
549
+ "precision": 0.27672955974842767,
550
+ "recall": 0.32116788321167883,
551
+ "f1-score": 0.29729729729729726,
552
+ "support": 274
553
+ },
554
+ "Physiology": {
555
+ "precision": 0.5590094836670179,
556
+ "recall": 0.5657158091175687,
557
+ "f1-score": 0.562342652709686,
558
+ "support": 3751
559
+ },
560
+ "Procedures": {
561
+ "precision": 0.5877219380078242,
562
+ "recall": 0.6000921800583807,
563
+ "f1-score": 0.5938426453819841,
564
+ "support": 6509
565
+ },
566
+ "macro avg": {
567
+ "precision": 0.5842801101582417,
568
+ "recall": 0.5815153484283105,
569
+ "f1-score": 0.5801873486628216,
570
+ "support": 40864
571
+ },
572
+ "weighted avg": {
573
+ "precision": 0.6500699757755491,
574
+ "recall": 0.6334915818324197,
575
+ "f1-score": 0.6401859322619969,
576
+ "support": 40864
577
+ }
578
+ }
579
+ },
580
+ "test": {
581
+ "token_level": {
582
+ "O": {
583
+ "precision": 0.9327288328920794,
584
+ "recall": 0.9450010602205259,
585
+ "f1-score": 0.9388248429279391,
586
+ "support": 188640.0
587
+ },
588
+ "B-Anatomy": {
589
+ "precision": 0.7327746741154563,
590
+ "recall": 0.7246777163904236,
591
+ "f1-score": 0.7287037037037037,
592
+ "support": 3258.0
593
+ },
594
+ "I-Anatomy": {
595
+ "precision": 0.7148125384142594,
596
+ "recall": 0.7663920922570017,
597
+ "f1-score": 0.7397042455080299,
598
+ "support": 3035.0
599
+ },
600
+ "B-Chemicals & Drugs": {
601
+ "precision": 0.8137902559867878,
602
+ "recall": 0.8030694010593508,
603
+ "f1-score": 0.8083942853236722,
604
+ "support": 7363.0
605
+ },
606
+ "I-Chemicals & Drugs": {
607
+ "precision": 0.8251571052098114,
608
+ "recall": 0.8428408737964592,
609
+ "f1-score": 0.8339052496798975,
610
+ "support": 9659.0
611
+ },
612
+ "B-Concepts & Ideas": {
613
+ "precision": 0.6245535714285714,
614
+ "recall": 0.3821360284075389,
615
+ "f1-score": 0.47415692255549907,
616
+ "support": 3661.0
617
+ },
618
+ "I-Concepts & Ideas": {
619
+ "precision": 0.5620985010706638,
620
+ "recall": 0.5108660395718456,
621
+ "f1-score": 0.5352591333899746,
622
+ "support": 3083.0
623
+ },
624
+ "B-Devices": {
625
+ "precision": 0.5829596412556054,
626
+ "recall": 0.3672316384180791,
627
+ "f1-score": 0.4506065857885615,
628
+ "support": 354.0
629
+ },
630
+ "I-Devices": {
631
+ "precision": 0.4653284671532847,
632
+ "recall": 0.450530035335689,
633
+ "f1-score": 0.4578096947935368,
634
+ "support": 566.0
635
+ },
636
+ "B-Disorders": {
637
+ "precision": 0.7646812665643744,
638
+ "recall": 0.6812476699391078,
639
+ "f1-score": 0.7205573080967402,
640
+ "support": 8047.0
641
+ },
642
+ "I-Disorders": {
643
+ "precision": 0.7384636639955788,
644
+ "recall": 0.7031969477700303,
645
+ "f1-score": 0.7203989487162208,
646
+ "support": 7601.0
647
+ },
648
+ "B-Genes & Molecular Sequences": {
649
+ "precision": 0.6082304526748972,
650
+ "recall": 0.6651665166516652,
651
+ "f1-score": 0.6354256233877902,
652
+ "support": 1111.0
653
+ },
654
+ "I-Genes & Molecular Sequences": {
655
+ "precision": 0.6074357572443958,
656
+ "recall": 0.6558441558441559,
657
+ "f1-score": 0.6307124609707635,
658
+ "support": 1694.0
659
+ },
660
+ "B-Geographic Areas": {
661
+ "precision": 0.7287878787878788,
662
+ "recall": 0.8097643097643098,
663
+ "f1-score": 0.7671451355661882,
664
+ "support": 594.0
665
+ },
666
+ "I-Geographic Areas": {
667
+ "precision": 0.720226843100189,
668
+ "recall": 0.6840215439856373,
669
+ "f1-score": 0.7016574585635359,
670
+ "support": 557.0
671
+ },
672
+ "B-Living Beings": {
673
+ "precision": 0.7869297163995068,
674
+ "recall": 0.8009538152610441,
675
+ "f1-score": 0.7938798358004727,
676
+ "support": 3984.0
677
+ },
678
+ "I-Living Beings": {
679
+ "precision": 0.8368229403732362,
680
+ "recall": 0.8145768719539211,
681
+ "f1-score": 0.8255500673551863,
682
+ "support": 4514.0
683
+ },
684
+ "B-Objects": {
685
+ "precision": 0.5885558583106267,
686
+ "recall": 0.6447761194029851,
687
+ "f1-score": 0.6153846153846154,
688
+ "support": 335.0
689
+ },
690
+ "I-Objects": {
691
+ "precision": 0.6981818181818182,
692
+ "recall": 0.6421404682274248,
693
+ "f1-score": 0.6689895470383276,
694
+ "support": 299.0
695
+ },
696
+ "B-Occupations": {
697
+ "precision": 0.44398340248962653,
698
+ "recall": 0.5487179487179488,
699
+ "f1-score": 0.4908256880733945,
700
+ "support": 195.0
701
+ },
702
+ "I-Occupations": {
703
+ "precision": 0.3793103448275862,
704
+ "recall": 0.5076923076923077,
705
+ "f1-score": 0.4342105263157895,
706
+ "support": 130.0
707
+ },
708
+ "B-Organizations": {
709
+ "precision": 0.5512820512820513,
710
+ "recall": 0.675392670157068,
711
+ "f1-score": 0.6070588235294118,
712
+ "support": 382.0
713
+ },
714
+ "I-Organizations": {
715
+ "precision": 0.5950653120464441,
716
+ "recall": 0.8023483365949119,
717
+ "f1-score": 0.6833333333333333,
718
+ "support": 511.0
719
+ },
720
+ "B-Phenomena": {
721
+ "precision": 0.2920962199312715,
722
+ "recall": 0.32075471698113206,
723
+ "f1-score": 0.3057553956834532,
724
+ "support": 265.0
725
+ },
726
+ "I-Phenomena": {
727
+ "precision": 0.2720306513409962,
728
+ "recall": 0.3397129186602871,
729
+ "f1-score": 0.3021276595744681,
730
+ "support": 209.0
731
+ },
732
+ "B-Physiology": {
733
+ "precision": 0.6418974499588703,
734
+ "recall": 0.6137912952281069,
735
+ "f1-score": 0.6275298217397132,
736
+ "support": 3814.0
737
+ },
738
+ "I-Physiology": {
739
+ "precision": 0.5521588402143083,
740
+ "recall": 0.6245989304812835,
741
+ "f1-score": 0.5861492137838742,
742
+ "support": 2805.0
743
+ },
744
+ "B-Procedures": {
745
+ "precision": 0.6753959542104437,
746
+ "recall": 0.6567551082647148,
747
+ "f1-score": 0.6659451101662157,
748
+ "support": 6558.0
749
+ },
750
+ "I-Procedures": {
751
+ "precision": 0.7067256367241116,
752
+ "recall": 0.6688799076212472,
753
+ "f1-score": 0.6872821653689284,
754
+ "support": 6928.0
755
+ },
756
+ "accuracy": 0.870661701560603,
757
+ "macro avg": {
758
+ "precision": 0.6359470912477494,
759
+ "recall": 0.6432095670571105,
760
+ "f1-score": 0.6357683931765253,
761
+ "support": 270152.0
762
+ },
763
+ "weighted avg": {
764
+ "precision": 0.8687298439825012,
765
+ "recall": 0.870661701560603,
766
+ "f1-score": 0.8690103283416164,
767
+ "support": 270152.0
768
+ }
769
+ },
770
+ "span_level": {
771
+ "Anatomy": {
772
+ "precision": 0.6564270802266627,
773
+ "recall": 0.67165090021361,
774
+ "f1-score": 0.6639517345399698,
775
+ "support": 3277
776
+ },
777
+ "Chemicals & Drugs": {
778
+ "precision": 0.748099891422367,
779
+ "recall": 0.745066234117329,
780
+ "f1-score": 0.7465799810375187,
781
+ "support": 7398
782
+ },
783
+ "Concepts & Ideas": {
784
+ "precision": 0.5149451381006432,
785
+ "recall": 0.36953570458865054,
786
+ "f1-score": 0.4302877015491622,
787
+ "support": 3683
788
+ },
789
+ "Devices": {
790
+ "precision": 0.44745762711864406,
791
+ "recall": 0.37183098591549296,
792
+ "f1-score": 0.40615384615384614,
793
+ "support": 355
794
+ },
795
+ "Disorders": {
796
+ "precision": 0.6907038512616201,
797
+ "recall": 0.6413861141941053,
798
+ "f1-score": 0.6651320416906452,
799
+ "support": 8109
800
+ },
801
+ "Genes & Molecular Sequences": {
802
+ "precision": 0.5060048038430744,
803
+ "recall": 0.5668161434977579,
804
+ "f1-score": 0.5346869712351946,
805
+ "support": 1115
806
+ },
807
+ "Geographic Areas": {
808
+ "precision": 0.6712328767123288,
809
+ "recall": 0.7374581939799331,
810
+ "f1-score": 0.7027888446215139,
811
+ "support": 598
812
+ },
813
+ "Living Beings": {
814
+ "precision": 0.7177242888402626,
815
+ "recall": 0.7391086629944917,
816
+ "f1-score": 0.7282595288022696,
817
+ "support": 3994
818
+ },
819
+ "Objects": {
820
+ "precision": 0.5180412371134021,
821
+ "recall": 0.5982142857142857,
822
+ "f1-score": 0.5552486187845304,
823
+ "support": 336
824
+ },
825
+ "Occupations": {
826
+ "precision": 0.3671875,
827
+ "recall": 0.47959183673469385,
828
+ "f1-score": 0.415929203539823,
829
+ "support": 196
830
+ },
831
+ "Organizations": {
832
+ "precision": 0.5041666666666667,
833
+ "recall": 0.6335078534031413,
834
+ "f1-score": 0.5614849187935034,
835
+ "support": 382
836
+ },
837
+ "Phenomena": {
838
+ "precision": 0.2056338028169014,
839
+ "recall": 0.27137546468401486,
840
+ "f1-score": 0.23397435897435898,
841
+ "support": 269
842
+ },
843
+ "Physiology": {
844
+ "precision": 0.5598194130925508,
845
+ "recall": 0.5823115053482911,
846
+ "f1-score": 0.570843989769821,
847
+ "support": 3833
848
+ },
849
+ "Procedures": {
850
+ "precision": 0.5965460771177609,
851
+ "recall": 0.607213214123352,
852
+ "f1-score": 0.6018323820967258,
853
+ "support": 6599
854
+ },
855
+ "macro avg": {
856
+ "precision": 0.5502850181666347,
857
+ "recall": 0.5725047928220821,
858
+ "f1-score": 0.558368151542063,
859
+ "support": 40144
860
+ },
861
+ "weighted avg": {
862
+ "precision": 0.6414502549457723,
863
+ "recall": 0.6297578716620167,
864
+ "f1-score": 0.6340080620691133,
865
+ "support": 40144
866
+ }
867
+ }
868
+ }
869
+ }
performance_report.md ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Performance on Training Set
2
+
3
+ ## Span Level
4
+
5
+ | Label | Precision | Recall | F1-score | Support |
6
+ | --- | --- | --- | --- | --- |
7
+ | Anatomy | 0.870 | 0.872 | 0.871 | 11509 |
8
+ | Chemicals & Drugs | 0.902 | 0.889 | 0.895 | 22432 |
9
+ | Concepts & Ideas | 0.790 | 0.630 | 0.701 | 10273 |
10
+ | Devices | 0.805 | 0.791 | 0.798 | 1170 |
11
+ | Disorders | 0.847 | 0.817 | 0.832 | 24634 |
12
+ | Genes & Molecular Sequences | 0.809 | 0.851 | 0.829 | 3382 |
13
+ | Geographic Areas | 0.876 | 0.907 | 0.892 | 1712 |
14
+ | Living Beings | 0.888 | 0.888 | 0.888 | 11388 |
15
+ | Objects | 0.806 | 0.886 | 0.844 | 851 |
16
+ | Occupations | 0.576 | 0.709 | 0.636 | 522 |
17
+ | Organizations | 0.797 | 0.885 | 0.839 | 1308 |
18
+ | Phenomena | 0.448 | 0.469 | 0.458 | 1131 |
19
+ | Physiology | 0.784 | 0.789 | 0.787 | 11726 |
20
+ | Procedures | 0.773 | 0.801 | 0.787 | 20145 |
21
+ | macro avg | 0.784 | 0.799 | 0.790 | 122183 |
22
+ | weighted avg | 0.833 | 0.820 | 0.826 | 122183 |
23
+
24
+ ## Token Level
25
+
26
+ | Label | Precision | Recall | F1-score | Support |
27
+ | --- | --- | --- | --- | --- |
28
+ | O | 0.972 | 0.976 | 0.974 | 557052 |
29
+ | B-Anatomy | 0.919 | 0.913 | 0.916 | 11456 |
30
+ | I-Anatomy | 0.917 | 0.945 | 0.931 | 9946 |
31
+ | B-Chemicals & Drugs | 0.943 | 0.934 | 0.938 | 22294 |
32
+ | I-Chemicals & Drugs | 0.963 | 0.955 | 0.959 | 31036 |
33
+ | B-Concepts & Ideas | 0.877 | 0.652 | 0.748 | 10226 |
34
+ | I-Concepts & Ideas | 0.884 | 0.856 | 0.869 | 8750 |
35
+ | B-Devices | 0.868 | 0.801 | 0.833 | 1163 |
36
+ | I-Devices | 0.850 | 0.886 | 0.868 | 1602 |
37
+ | B-Disorders | 0.900 | 0.850 | 0.875 | 24455 |
38
+ | I-Disorders | 0.897 | 0.907 | 0.902 | 22954 |
39
+ | B-Genes & Molecular Sequences | 0.868 | 0.905 | 0.886 | 3373 |
40
+ | I-Genes & Molecular Sequences | 0.891 | 0.935 | 0.913 | 5266 |
41
+ | B-Geographic Areas | 0.917 | 0.950 | 0.933 | 1702 |
42
+ | I-Geographic Areas | 0.937 | 0.934 | 0.936 | 1370 |
43
+ | B-Living Beings | 0.925 | 0.935 | 0.930 | 11285 |
44
+ | I-Living Beings | 0.958 | 0.963 | 0.960 | 12389 |
45
+ | B-Objects | 0.857 | 0.956 | 0.903 | 832 |
46
+ | I-Objects | 0.904 | 0.936 | 0.920 | 705 |
47
+ | B-Occupations | 0.649 | 0.759 | 0.699 | 518 |
48
+ | I-Occupations | 0.600 | 0.756 | 0.669 | 393 |
49
+ | B-Organizations | 0.837 | 0.916 | 0.874 | 1296 |
50
+ | I-Organizations | 0.857 | 0.958 | 0.904 | 1766 |
51
+ | B-Phenomena | 0.541 | 0.513 | 0.527 | 1124 |
52
+ | I-Phenomena | 0.582 | 0.486 | 0.530 | 989 |
53
+ | B-Physiology | 0.849 | 0.819 | 0.834 | 11666 |
54
+ | I-Physiology | 0.810 | 0.881 | 0.844 | 8875 |
55
+ | B-Procedures | 0.829 | 0.844 | 0.837 | 20022 |
56
+ | I-Procedures | 0.891 | 0.896 | 0.893 | 20889 |
57
+ | macro avg | 0.851 | 0.863 | 0.855 | 805394 |
58
+ | weighted avg | 0.949 | 0.949 | 0.949 | 805394 |
59
+
60
+
61
+ # Performance on Validation Set
62
+
63
+ ## Span Level
64
+
65
+ | Label | Precision | Recall | F1-score | Support |
66
+ | --- | --- | --- | --- | --- |
67
+ | Anatomy | 0.682 | 0.697 | 0.690 | 3881 |
68
+ | Chemicals & Drugs | 0.751 | 0.757 | 0.754 | 7490 |
69
+ | Concepts & Ideas | 0.487 | 0.339 | 0.400 | 3365 |
70
+ | Devices | 0.606 | 0.436 | 0.507 | 493 |
71
+ | Disorders | 0.700 | 0.646 | 0.672 | 8320 |
72
+ | Genes & Molecular Sequences | 0.546 | 0.580 | 0.563 | 965 |
73
+ | Geographic Areas | 0.747 | 0.760 | 0.753 | 678 |
74
+ | Living Beings | 0.716 | 0.724 | 0.720 | 4228 |
75
+ | Objects | 0.552 | 0.656 | 0.600 | 259 |
76
+ | Occupations | 0.466 | 0.480 | 0.473 | 198 |
77
+ | Organizations | 0.502 | 0.578 | 0.537 | 453 |
78
+ | Phenomena | 0.277 | 0.321 | 0.297 | 274 |
79
+ | Physiology | 0.559 | 0.566 | 0.562 | 3751 |
80
+ | Procedures | 0.588 | 0.600 | 0.594 | 6509 |
81
+ | macro avg | 0.584 | 0.582 | 0.580 | 40864 |
82
+ | weighted avg | 0.650 | 0.633 | 0.640 | 40864 |
83
+
84
+ ## Token Level
85
+
86
+ | Label | Precision | Recall | F1-score | Support |
87
+ | --- | --- | --- | --- | --- |
88
+ | O | 0.931 | 0.947 | 0.939 | 187057 |
89
+ | B-Anatomy | 0.761 | 0.754 | 0.757 | 3862 |
90
+ | I-Anatomy | 0.720 | 0.775 | 0.747 | 3271 |
91
+ | B-Chemicals & Drugs | 0.821 | 0.823 | 0.822 | 7409 |
92
+ | I-Chemicals & Drugs | 0.856 | 0.847 | 0.851 | 10003 |
93
+ | B-Concepts & Ideas | 0.590 | 0.360 | 0.447 | 3342 |
94
+ | I-Concepts & Ideas | 0.556 | 0.459 | 0.503 | 2857 |
95
+ | B-Devices | 0.696 | 0.443 | 0.542 | 492 |
96
+ | I-Devices | 0.628 | 0.554 | 0.588 | 688 |
97
+ | B-Disorders | 0.770 | 0.688 | 0.727 | 8238 |
98
+ | I-Disorders | 0.729 | 0.711 | 0.720 | 7334 |
99
+ | B-Genes & Molecular Sequences | 0.643 | 0.656 | 0.649 | 957 |
100
+ | I-Genes & Molecular Sequences | 0.677 | 0.710 | 0.693 | 1529 |
101
+ | B-Geographic Areas | 0.793 | 0.809 | 0.801 | 674 |
102
+ | I-Geographic Areas | 0.746 | 0.768 | 0.756 | 542 |
103
+ | B-Living Beings | 0.795 | 0.801 | 0.798 | 4185 |
104
+ | I-Living Beings | 0.844 | 0.820 | 0.832 | 4850 |
105
+ | B-Objects | 0.610 | 0.715 | 0.658 | 256 |
106
+ | I-Objects | 0.675 | 0.672 | 0.674 | 244 |
107
+ | B-Occupations | 0.521 | 0.508 | 0.514 | 197 |
108
+ | I-Occupations | 0.474 | 0.364 | 0.412 | 173 |
109
+ | B-Organizations | 0.579 | 0.650 | 0.613 | 443 |
110
+ | I-Organizations | 0.640 | 0.711 | 0.673 | 622 |
111
+ | B-Phenomena | 0.341 | 0.343 | 0.342 | 274 |
112
+ | I-Phenomena | 0.285 | 0.321 | 0.302 | 212 |
113
+ | B-Physiology | 0.640 | 0.604 | 0.621 | 3726 |
114
+ | I-Physiology | 0.549 | 0.574 | 0.561 | 2688 |
115
+ | B-Procedures | 0.661 | 0.651 | 0.656 | 6454 |
116
+ | I-Procedures | 0.679 | 0.662 | 0.671 | 6567 |
117
+ | macro avg | 0.662 | 0.645 | 0.651 | 269146 |
118
+ | weighted avg | 0.869 | 0.873 | 0.870 | 269146 |
119
+
120
+
121
+ # Performance on Testing Set
122
+
123
+ ## Span Level
124
+
125
+ | Label | Precision | Recall | F1-score | Support |
126
+ | --- | --- | --- | --- | --- |
127
+ | Anatomy | 0.656 | 0.672 | 0.664 | 3277 |
128
+ | Chemicals & Drugs | 0.748 | 0.745 | 0.747 | 7398 |
129
+ | Concepts & Ideas | 0.515 | 0.370 | 0.430 | 3683 |
130
+ | Devices | 0.447 | 0.372 | 0.406 | 355 |
131
+ | Disorders | 0.691 | 0.641 | 0.665 | 8109 |
132
+ | Genes & Molecular Sequences | 0.506 | 0.567 | 0.535 | 1115 |
133
+ | Geographic Areas | 0.671 | 0.737 | 0.703 | 598 |
134
+ | Living Beings | 0.718 | 0.739 | 0.728 | 3994 |
135
+ | Objects | 0.518 | 0.598 | 0.555 | 336 |
136
+ | Occupations | 0.367 | 0.480 | 0.416 | 196 |
137
+ | Organizations | 0.504 | 0.634 | 0.561 | 382 |
138
+ | Phenomena | 0.206 | 0.271 | 0.234 | 269 |
139
+ | Physiology | 0.560 | 0.582 | 0.571 | 3833 |
140
+ | Procedures | 0.597 | 0.607 | 0.602 | 6599 |
141
+ | macro avg | 0.550 | 0.573 | 0.558 | 40144 |
142
+ | weighted avg | 0.641 | 0.630 | 0.634 | 40144 |
143
+
144
+ ## Token Level
145
+
146
+ | Label | Precision | Recall | F1-score | Support |
147
+ | --- | --- | --- | --- | --- |
148
+ | O | 0.933 | 0.945 | 0.939 | 188640 |
149
+ | B-Anatomy | 0.733 | 0.725 | 0.729 | 3258 |
150
+ | I-Anatomy | 0.715 | 0.766 | 0.740 | 3035 |
151
+ | B-Chemicals & Drugs | 0.814 | 0.803 | 0.808 | 7363 |
152
+ | I-Chemicals & Drugs | 0.825 | 0.843 | 0.834 | 9659 |
153
+ | B-Concepts & Ideas | 0.625 | 0.382 | 0.474 | 3661 |
154
+ | I-Concepts & Ideas | 0.562 | 0.511 | 0.535 | 3083 |
155
+ | B-Devices | 0.583 | 0.367 | 0.451 | 354 |
156
+ | I-Devices | 0.465 | 0.451 | 0.458 | 566 |
157
+ | B-Disorders | 0.765 | 0.681 | 0.721 | 8047 |
158
+ | I-Disorders | 0.738 | 0.703 | 0.720 | 7601 |
159
+ | B-Genes & Molecular Sequences | 0.608 | 0.665 | 0.635 | 1111 |
160
+ | I-Genes & Molecular Sequences | 0.607 | 0.656 | 0.631 | 1694 |
161
+ | B-Geographic Areas | 0.729 | 0.810 | 0.767 | 594 |
162
+ | I-Geographic Areas | 0.720 | 0.684 | 0.702 | 557 |
163
+ | B-Living Beings | 0.787 | 0.801 | 0.794 | 3984 |
164
+ | I-Living Beings | 0.837 | 0.815 | 0.826 | 4514 |
165
+ | B-Objects | 0.589 | 0.645 | 0.615 | 335 |
166
+ | I-Objects | 0.698 | 0.642 | 0.669 | 299 |
167
+ | B-Occupations | 0.444 | 0.549 | 0.491 | 195 |
168
+ | I-Occupations | 0.379 | 0.508 | 0.434 | 130 |
169
+ | B-Organizations | 0.551 | 0.675 | 0.607 | 382 |
170
+ | I-Organizations | 0.595 | 0.802 | 0.683 | 511 |
171
+ | B-Phenomena | 0.292 | 0.321 | 0.306 | 265 |
172
+ | I-Phenomena | 0.272 | 0.340 | 0.302 | 209 |
173
+ | B-Physiology | 0.642 | 0.614 | 0.628 | 3814 |
174
+ | I-Physiology | 0.552 | 0.625 | 0.586 | 2805 |
175
+ | B-Procedures | 0.675 | 0.657 | 0.666 | 6558 |
176
+ | I-Procedures | 0.707 | 0.669 | 0.687 | 6928 |
177
+ | macro avg | 0.636 | 0.643 | 0.636 | 270152 |
178
+ | weighted avg | 0.869 | 0.871 | 0.869 | 270152 |
179
+
180
+
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9eb0be8c505b7f7bc80024c0457befd900a96d3f1eb3524224a115b11dd2ad72
3
+ size 14244
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
trainer_state.json ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.6506678662586591,
3
+ "best_model_checkpoint": "tmp_ner_fantastic-bale-19_38/run-25/checkpoint-660",
4
+ "epoch": 4.0,
5
+ "eval_steps": 500,
6
+ "global_step": 660,
7
+ "is_hyper_param_search": true,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.0,
13
+ "eval_accuracy": 0.8553461689937804,
14
+ "eval_loss": 0.47332701086997986,
15
+ "eval_macro_f1": 0.5179058864521847,
16
+ "eval_macro_precision": 0.5969084028407209,
17
+ "eval_macro_recall": 0.5063081060851987,
18
+ "eval_runtime": 5.987,
19
+ "eval_samples_per_second": 146.651,
20
+ "eval_steps_per_second": 18.373,
21
+ "step": 165
22
+ },
23
+ {
24
+ "epoch": 2.0,
25
+ "eval_accuracy": 0.8667526175384364,
26
+ "eval_loss": 0.423623263835907,
27
+ "eval_macro_f1": 0.6126875404495669,
28
+ "eval_macro_precision": 0.6399554829251698,
29
+ "eval_macro_recall": 0.6102712004848894,
30
+ "eval_runtime": 5.9737,
31
+ "eval_samples_per_second": 146.978,
32
+ "eval_steps_per_second": 18.414,
33
+ "step": 330
34
+ },
35
+ {
36
+ "epoch": 3.0,
37
+ "eval_accuracy": 0.8703603248794335,
38
+ "eval_loss": 0.4243398904800415,
39
+ "eval_macro_f1": 0.6335065301362576,
40
+ "eval_macro_precision": 0.6801343369395938,
41
+ "eval_macro_recall": 0.6289027870007556,
42
+ "eval_runtime": 5.9773,
43
+ "eval_samples_per_second": 146.888,
44
+ "eval_steps_per_second": 18.403,
45
+ "step": 495
46
+ },
47
+ {
48
+ "epoch": 3.0303030303030303,
49
+ "grad_norm": 0.778854250907898,
50
+ "learning_rate": 9.035786517978707e-05,
51
+ "loss": 0.5906,
52
+ "step": 500
53
+ },
54
+ {
55
+ "epoch": 4.0,
56
+ "eval_accuracy": 0.8725375818329085,
57
+ "eval_loss": 0.43925729393959045,
58
+ "eval_macro_f1": 0.6506678662586591,
59
+ "eval_macro_precision": 0.6623715223268644,
60
+ "eval_macro_recall": 0.6448063227018537,
61
+ "eval_runtime": 5.9602,
62
+ "eval_samples_per_second": 147.311,
63
+ "eval_steps_per_second": 18.456,
64
+ "step": 660
65
+ }
66
+ ],
67
+ "logging_steps": 500,
68
+ "max_steps": 5280,
69
+ "num_input_tokens_seen": 0,
70
+ "num_train_epochs": 32,
71
+ "save_steps": 500,
72
+ "stateful_callbacks": {
73
+ "EarlyStoppingCallback": {
74
+ "args": {
75
+ "early_stopping_patience": 3,
76
+ "early_stopping_threshold": 0.001
77
+ },
78
+ "attributes": {
79
+ "early_stopping_patience_counter": 0
80
+ }
81
+ },
82
+ "TrainerControl": {
83
+ "args": {
84
+ "should_epoch_stop": false,
85
+ "should_evaluate": false,
86
+ "should_log": false,
87
+ "should_save": true,
88
+ "should_training_stop": false
89
+ },
90
+ "attributes": {}
91
+ }
92
+ },
93
+ "total_flos": 3213359444608236.0,
94
+ "train_batch_size": 16,
95
+ "trial_name": null,
96
+ "trial_params": {
97
+ "learning_rate": 9.767344966191627e-05,
98
+ "per_device_train_batch_size": 16,
99
+ "warmup_ratio": 0.021367464793327073,
100
+ "weight_decay": 0.025286446963170207
101
+ }
102
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff