Upload folder using huggingface_hub
Browse files- README.md +82 -0
- best_hyperparameters.json +7 -0
- config.json +56 -0
- model.safetensors +3 -0
- performance_report.json +869 -0
- performance_report.md +180 -0
- rng_state.pth +3 -0
- special_tokens_map.json +7 -0
- tokenizer.json +0 -0
- tokenizer_config.json +58 -0
- trainer_state.json +102 -0
- vocab.txt +0 -0
README.md
ADDED
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
task: token-classification
|
3 |
+
tags:
|
4 |
+
- biomedical
|
5 |
+
- bionlp
|
6 |
+
license: mit
|
7 |
+
base_model: microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
|
8 |
+
---
|
9 |
+
|
10 |
+
# bioner_medmentions_st21pv
|
11 |
+
|
12 |
+
This is a named entity recognition model fine-tuned from the [microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) model. It predicts spans with 14 possible labels. The labels are **Anatomy, Chemicals & Drugs, Concepts & Ideas, Devices, Disorders, Genes & Molecular Sequences, Geographic Areas, Living Beings, Objects, Occupations, Organizations, Phenomena, Physiology and Procedures**.
|
13 |
+
|
14 |
+
The code used for training this model can be found at https://github.com/Glasgow-AI4BioMed/bioner along with links to other biomedical NER models trained on well-known biomedical corpora. The source dataset information is below.
|
15 |
+
|
16 |
+
## Example Usage
|
17 |
+
|
18 |
+
The code below will load up the model and apply it to the provided text. It uses a simple aggregation strategy to post-process the individual tokens into larger multi-token entities where needed.
|
19 |
+
|
20 |
+
```python
|
21 |
+
from transformers import pipeline
|
22 |
+
|
23 |
+
# Load the model as part of an NER pipeline
|
24 |
+
ner_pipeline = pipeline("token-classification",
|
25 |
+
model="Glasgow-AI4BioMed/bioner_medmentions_st21pv",
|
26 |
+
aggregation_strategy="max")
|
27 |
+
|
28 |
+
# Apply it to some text
|
29 |
+
ner_pipeline("EGFR T790M mutations have been known to affect treatment outcomes for NSCLC patients receiving erlotinib.")
|
30 |
+
|
31 |
+
# Output:
|
32 |
+
# [ {"entity_group": "Disorders", "score": 0.62466, "word": "egfr t790m mutations", "start": 0, "end": 20},
|
33 |
+
# {"entity_group": "Disorders", "score": 0.98835, "word": "nsclc", "start": 51, "end": 56},
|
34 |
+
# {"entity_group": "Chemicals & Drugs", "score": 0.97885, "word": "erlotinib", "start": 76, "end": 85} ]
|
35 |
+
```
|
36 |
+
|
37 |
+
## Dataset Info
|
38 |
+
|
39 |
+
**Source:** The ST21pv version of MedMentions was downloaded from: https://github.com/chanzuckerberg/MedMentions/tree/master/st21pv
|
40 |
+
|
41 |
+
The dataset should be cited with: Mohan, Sunil, and Donghui Li. "MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts." Automated Knowledge Base Construction (AKBC), 2019, https://openreview.net/forum?id=SylxCx5pTQ. DOI: [10.24432/C5G59C](https://doi.org/10.24432/C5G59C)
|
42 |
+
|
43 |
+
An overview of semantic types can be found at: https://www.nlm.nih.gov/research/umls/META3_current_semantic_types.html
|
44 |
+
|
45 |
+
**Preprocessing:** The training, validation and test splits were maintained from the original dataset. Concept identifiers (CUIs) were used to map each annotation to its associated UMLS entry to recover semantic types (from the MRSTY.RRF UMLS file). Semantic types provided in MedMentions were not used. Annotations were mapped to specific *semantic groups* names using the Semantic Groups file available at: https://www.nlm.nih.gov/research/umls/knowledge_sources/semantic_network/index.html. This contrasts with the finegrained version that mapped annotations to *semantic types*. The preprocessing script for this dataset is [prepare_medmentions.py](https://github.com/Glasgow-AI4BioMed/bioner/blob/main/prepare_medmentions.py.py) without the --finegrain flag.
|
46 |
+
|
47 |
+
## Performance
|
48 |
+
|
49 |
+
The span-level performance on the test split for the different labels are shown in the tables below. The full performance results are available in the model repo in Markdown format for viewing and JSON format for easier loading. These include the performance at token level (with individual B- and I- labels as the token classifier uses IOB2 token labelling).
|
50 |
+
|
51 |
+
| Label | Precision | Recall | F1-score | Support |
|
52 |
+
| --- | --- | --- | --- | --- |
|
53 |
+
| Anatomy | 0.656 | 0.672 | 0.664 | 3277 |
|
54 |
+
| Chemicals & Drugs | 0.748 | 0.745 | 0.747 | 7398 |
|
55 |
+
| Concepts & Ideas | 0.515 | 0.370 | 0.430 | 3683 |
|
56 |
+
| Devices | 0.447 | 0.372 | 0.406 | 355 |
|
57 |
+
| Disorders | 0.691 | 0.641 | 0.665 | 8109 |
|
58 |
+
| Genes & Molecular Sequences | 0.506 | 0.567 | 0.535 | 1115 |
|
59 |
+
| Geographic Areas | 0.671 | 0.737 | 0.703 | 598 |
|
60 |
+
| Living Beings | 0.718 | 0.739 | 0.728 | 3994 |
|
61 |
+
| Objects | 0.518 | 0.598 | 0.555 | 336 |
|
62 |
+
| Occupations | 0.367 | 0.480 | 0.416 | 196 |
|
63 |
+
| Organizations | 0.504 | 0.634 | 0.561 | 382 |
|
64 |
+
| Phenomena | 0.206 | 0.271 | 0.234 | 269 |
|
65 |
+
| Physiology | 0.560 | 0.582 | 0.571 | 3833 |
|
66 |
+
| Procedures | 0.597 | 0.607 | 0.602 | 6599 |
|
67 |
+
| macro avg | 0.550 | 0.573 | 0.558 | 40144 |
|
68 |
+
| weighted avg | 0.641 | 0.630 | 0.634 | 40144 |
|
69 |
+
|
70 |
+
|
71 |
+
## Hyperparameters
|
72 |
+
|
73 |
+
Hyperparameter tuning was done with [optuna](https://optuna.org/) and the [hyperparameter_search](https://huggingface.co/docs/transformers/en/hpo_train) functionality. 100 trials were run. Early stopping was applied during training. The best performing model was selected using the macro F1 performance on the validation set. The selected hyperparameters are in the table below.
|
74 |
+
|
75 |
+
| Hyperparameter | Value |
|
76 |
+
|----------------|-------|
|
77 |
+
| epochs | 4.0 |
|
78 |
+
| learning_rate | 9.767344966191627e-05 |
|
79 |
+
| per_device_train_batch_size | 16 |
|
80 |
+
| weight_decay | 0.025286446963170207 |
|
81 |
+
| warmup_ratio | 0.021367464793327073 |
|
82 |
+
|
best_hyperparameters.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epochs": 4.0,
|
3 |
+
"learning_rate": 9.767344966191627e-05,
|
4 |
+
"per_device_train_batch_size": 16,
|
5 |
+
"weight_decay": 0.025286446963170207,
|
6 |
+
"warmup_ratio": 0.021367464793327073
|
7 |
+
}
|
config.json
ADDED
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext",
|
3 |
+
"architectures": [
|
4 |
+
"BertForTokenClassification"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"classifier_dropout": null,
|
8 |
+
"hidden_act": "gelu",
|
9 |
+
"hidden_dropout_prob": 0.1,
|
10 |
+
"hidden_size": 768,
|
11 |
+
"id2label": {
|
12 |
+
"0": "O",
|
13 |
+
"1": "B-Anatomy",
|
14 |
+
"2": "I-Anatomy",
|
15 |
+
"3": "B-Chemicals & Drugs",
|
16 |
+
"4": "I-Chemicals & Drugs",
|
17 |
+
"5": "B-Concepts & Ideas",
|
18 |
+
"6": "I-Concepts & Ideas",
|
19 |
+
"7": "B-Devices",
|
20 |
+
"8": "I-Devices",
|
21 |
+
"9": "B-Disorders",
|
22 |
+
"10": "I-Disorders",
|
23 |
+
"11": "B-Genes & Molecular Sequences",
|
24 |
+
"12": "I-Genes & Molecular Sequences",
|
25 |
+
"13": "B-Geographic Areas",
|
26 |
+
"14": "I-Geographic Areas",
|
27 |
+
"15": "B-Living Beings",
|
28 |
+
"16": "I-Living Beings",
|
29 |
+
"17": "B-Objects",
|
30 |
+
"18": "I-Objects",
|
31 |
+
"19": "B-Occupations",
|
32 |
+
"20": "I-Occupations",
|
33 |
+
"21": "B-Organizations",
|
34 |
+
"22": "I-Organizations",
|
35 |
+
"23": "B-Phenomena",
|
36 |
+
"24": "I-Phenomena",
|
37 |
+
"25": "B-Physiology",
|
38 |
+
"26": "I-Physiology",
|
39 |
+
"27": "B-Procedures",
|
40 |
+
"28": "I-Procedures"
|
41 |
+
},
|
42 |
+
"initializer_range": 0.02,
|
43 |
+
"intermediate_size": 3072,
|
44 |
+
"layer_norm_eps": 1e-12,
|
45 |
+
"max_position_embeddings": 512,
|
46 |
+
"model_type": "bert",
|
47 |
+
"num_attention_heads": 12,
|
48 |
+
"num_hidden_layers": 12,
|
49 |
+
"pad_token_id": 0,
|
50 |
+
"position_embedding_type": "absolute",
|
51 |
+
"torch_dtype": "float32",
|
52 |
+
"transformers_version": "4.48.1",
|
53 |
+
"type_vocab_size": 2,
|
54 |
+
"use_cache": true,
|
55 |
+
"vocab_size": 30522
|
56 |
+
}
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f089f7dbe5fce6283f669a8ca9cb4ed37d330acc16a1cc129b4086edbc54404e
|
3 |
+
size 435679140
|
performance_report.json
ADDED
@@ -0,0 +1,869 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"train": {
|
3 |
+
"token_level": {
|
4 |
+
"O": {
|
5 |
+
"precision": 0.9720447712640268,
|
6 |
+
"recall": 0.976257153730711,
|
7 |
+
"f1-score": 0.9741464087457614,
|
8 |
+
"support": 557052.0
|
9 |
+
},
|
10 |
+
"B-Anatomy": {
|
11 |
+
"precision": 0.919032967032967,
|
12 |
+
"recall": 0.9125349162011173,
|
13 |
+
"f1-score": 0.9157724146993124,
|
14 |
+
"support": 11456.0
|
15 |
+
},
|
16 |
+
"I-Anatomy": {
|
17 |
+
"precision": 0.9173255246461689,
|
18 |
+
"recall": 0.9449024733561231,
|
19 |
+
"f1-score": 0.9309098113020653,
|
20 |
+
"support": 9946.0
|
21 |
+
},
|
22 |
+
"B-Chemicals & Drugs": {
|
23 |
+
"precision": 0.9425027164070989,
|
24 |
+
"recall": 0.9337938458778147,
|
25 |
+
"f1-score": 0.9381280699382633,
|
26 |
+
"support": 22294.0
|
27 |
+
},
|
28 |
+
"I-Chemicals & Drugs": {
|
29 |
+
"precision": 0.9631471190406552,
|
30 |
+
"recall": 0.9549233148601624,
|
31 |
+
"f1-score": 0.9590175870047082,
|
32 |
+
"support": 31036.0
|
33 |
+
},
|
34 |
+
"B-Concepts & Ideas": {
|
35 |
+
"precision": 0.8773684210526316,
|
36 |
+
"recall": 0.6520633678857813,
|
37 |
+
"f1-score": 0.74812072254011,
|
38 |
+
"support": 10226.0
|
39 |
+
},
|
40 |
+
"I-Concepts & Ideas": {
|
41 |
+
"precision": 0.8836440877979702,
|
42 |
+
"recall": 0.8557714285714285,
|
43 |
+
"f1-score": 0.8694844403158384,
|
44 |
+
"support": 8750.0
|
45 |
+
},
|
46 |
+
"B-Devices": {
|
47 |
+
"precision": 0.8684701492537313,
|
48 |
+
"recall": 0.8005159071367154,
|
49 |
+
"f1-score": 0.8331096196868009,
|
50 |
+
"support": 1163.0
|
51 |
+
},
|
52 |
+
"I-Devices": {
|
53 |
+
"precision": 0.8502097064110246,
|
54 |
+
"recall": 0.8857677902621723,
|
55 |
+
"f1-score": 0.8676245796392541,
|
56 |
+
"support": 1602.0
|
57 |
+
},
|
58 |
+
"B-Disorders": {
|
59 |
+
"precision": 0.9004460226042523,
|
60 |
+
"recall": 0.8502964628910243,
|
61 |
+
"f1-score": 0.8746529822495163,
|
62 |
+
"support": 24455.0
|
63 |
+
},
|
64 |
+
"I-Disorders": {
|
65 |
+
"precision": 0.8971253717191743,
|
66 |
+
"recall": 0.9068571926461619,
|
67 |
+
"f1-score": 0.9019650323894534,
|
68 |
+
"support": 22954.0
|
69 |
+
},
|
70 |
+
"B-Genes & Molecular Sequences": {
|
71 |
+
"precision": 0.8677474402730375,
|
72 |
+
"recall": 0.9045360213459828,
|
73 |
+
"f1-score": 0.8857599070982726,
|
74 |
+
"support": 3373.0
|
75 |
+
},
|
76 |
+
"I-Genes & Molecular Sequences": {
|
77 |
+
"precision": 0.8909386869234943,
|
78 |
+
"recall": 0.9354348651728067,
|
79 |
+
"f1-score": 0.9126447429365447,
|
80 |
+
"support": 5266.0
|
81 |
+
},
|
82 |
+
"B-Geographic Areas": {
|
83 |
+
"precision": 0.9166666666666666,
|
84 |
+
"recall": 0.9500587544065805,
|
85 |
+
"f1-score": 0.933064050778996,
|
86 |
+
"support": 1702.0
|
87 |
+
},
|
88 |
+
"I-Geographic Areas": {
|
89 |
+
"precision": 0.9370424597364568,
|
90 |
+
"recall": 0.9343065693430657,
|
91 |
+
"f1-score": 0.935672514619883,
|
92 |
+
"support": 1370.0
|
93 |
+
},
|
94 |
+
"B-Living Beings": {
|
95 |
+
"precision": 0.9251754385964912,
|
96 |
+
"recall": 0.9346034559149313,
|
97 |
+
"f1-score": 0.9298655499228565,
|
98 |
+
"support": 11285.0
|
99 |
+
},
|
100 |
+
"I-Living Beings": {
|
101 |
+
"precision": 0.9578313253012049,
|
102 |
+
"recall": 0.9625474210993623,
|
103 |
+
"f1-score": 0.9601835822698176,
|
104 |
+
"support": 12389.0
|
105 |
+
},
|
106 |
+
"B-Objects": {
|
107 |
+
"precision": 0.8566810344827587,
|
108 |
+
"recall": 0.9555288461538461,
|
109 |
+
"f1-score": 0.9034090909090909,
|
110 |
+
"support": 832.0
|
111 |
+
},
|
112 |
+
"I-Objects": {
|
113 |
+
"precision": 0.9041095890410958,
|
114 |
+
"recall": 0.9361702127659575,
|
115 |
+
"f1-score": 0.9198606271777003,
|
116 |
+
"support": 705.0
|
117 |
+
},
|
118 |
+
"B-Occupations": {
|
119 |
+
"precision": 0.6485148514851485,
|
120 |
+
"recall": 0.7586872586872587,
|
121 |
+
"f1-score": 0.699288256227758,
|
122 |
+
"support": 518.0
|
123 |
+
},
|
124 |
+
"I-Occupations": {
|
125 |
+
"precision": 0.6,
|
126 |
+
"recall": 0.7557251908396947,
|
127 |
+
"f1-score": 0.668918918918919,
|
128 |
+
"support": 393.0
|
129 |
+
},
|
130 |
+
"B-Organizations": {
|
131 |
+
"precision": 0.8365045806906272,
|
132 |
+
"recall": 0.9158950617283951,
|
133 |
+
"f1-score": 0.874401473296501,
|
134 |
+
"support": 1296.0
|
135 |
+
},
|
136 |
+
"I-Organizations": {
|
137 |
+
"precision": 0.8566362715298885,
|
138 |
+
"recall": 0.9575311438278595,
|
139 |
+
"f1-score": 0.9042780748663102,
|
140 |
+
"support": 1766.0
|
141 |
+
},
|
142 |
+
"B-Phenomena": {
|
143 |
+
"precision": 0.5407685098406748,
|
144 |
+
"recall": 0.5133451957295374,
|
145 |
+
"f1-score": 0.5267001369237791,
|
146 |
+
"support": 1124.0
|
147 |
+
},
|
148 |
+
"I-Phenomena": {
|
149 |
+
"precision": 0.5816203143893591,
|
150 |
+
"recall": 0.4863498483316481,
|
151 |
+
"f1-score": 0.5297356828193832,
|
152 |
+
"support": 989.0
|
153 |
+
},
|
154 |
+
"B-Physiology": {
|
155 |
+
"precision": 0.8493248045486852,
|
156 |
+
"recall": 0.8194753985942054,
|
157 |
+
"f1-score": 0.8341331471948347,
|
158 |
+
"support": 11666.0
|
159 |
+
},
|
160 |
+
"I-Physiology": {
|
161 |
+
"precision": 0.8103037213641546,
|
162 |
+
"recall": 0.8807887323943662,
|
163 |
+
"f1-score": 0.8440773134650685,
|
164 |
+
"support": 8875.0
|
165 |
+
},
|
166 |
+
"B-Procedures": {
|
167 |
+
"precision": 0.8294550810014728,
|
168 |
+
"recall": 0.8438717410848067,
|
169 |
+
"f1-score": 0.8366013071895425,
|
170 |
+
"support": 20022.0
|
171 |
+
},
|
172 |
+
"I-Procedures": {
|
173 |
+
"precision": 0.8911117462132038,
|
174 |
+
"recall": 0.8955909808990378,
|
175 |
+
"f1-score": 0.8933457488718574,
|
176 |
+
"support": 20889.0
|
177 |
+
},
|
178 |
+
"accuracy": 0.9494197870855755,
|
179 |
+
"macro avg": {
|
180 |
+
"precision": 0.8514396337694524,
|
181 |
+
"recall": 0.8625562259220192,
|
182 |
+
"f1-score": 0.8553404066895931,
|
183 |
+
"support": 805394.0
|
184 |
+
},
|
185 |
+
"weighted avg": {
|
186 |
+
"precision": 0.9492471775259178,
|
187 |
+
"recall": 0.9494197870855755,
|
188 |
+
"f1-score": 0.9490351220280436,
|
189 |
+
"support": 805394.0
|
190 |
+
}
|
191 |
+
},
|
192 |
+
"span_level": {
|
193 |
+
"Anatomy": {
|
194 |
+
"precision": 0.8700173310225303,
|
195 |
+
"recall": 0.872360761143453,
|
196 |
+
"f1-score": 0.8711874701722417,
|
197 |
+
"support": 11509
|
198 |
+
},
|
199 |
+
"Chemicals & Drugs": {
|
200 |
+
"precision": 0.9019501380028053,
|
201 |
+
"recall": 0.8886412268188303,
|
202 |
+
"f1-score": 0.8952462219028585,
|
203 |
+
"support": 22432
|
204 |
+
},
|
205 |
+
"Concepts & Ideas": {
|
206 |
+
"precision": 0.7898267870212247,
|
207 |
+
"recall": 0.630293001070768,
|
208 |
+
"f1-score": 0.7010990200855395,
|
209 |
+
"support": 10273
|
210 |
+
},
|
211 |
+
"Devices": {
|
212 |
+
"precision": 0.8045178105994787,
|
213 |
+
"recall": 0.7914529914529914,
|
214 |
+
"f1-score": 0.7979319258940112,
|
215 |
+
"support": 1170
|
216 |
+
},
|
217 |
+
"Disorders": {
|
218 |
+
"precision": 0.8474890461745871,
|
219 |
+
"recall": 0.8165949500690103,
|
220 |
+
"f1-score": 0.8317552201777961,
|
221 |
+
"support": 24634
|
222 |
+
},
|
223 |
+
"Genes & Molecular Sequences": {
|
224 |
+
"precision": 0.8086003372681282,
|
225 |
+
"recall": 0.8506800709639266,
|
226 |
+
"f1-score": 0.829106628242075,
|
227 |
+
"support": 3382
|
228 |
+
},
|
229 |
+
"Geographic Areas": {
|
230 |
+
"precision": 0.876410835214447,
|
231 |
+
"recall": 0.907126168224299,
|
232 |
+
"f1-score": 0.89150401836969,
|
233 |
+
"support": 1712
|
234 |
+
},
|
235 |
+
"Living Beings": {
|
236 |
+
"precision": 0.8884203127745564,
|
237 |
+
"recall": 0.8879522304179839,
|
238 |
+
"f1-score": 0.8881862099253405,
|
239 |
+
"support": 11388
|
240 |
+
},
|
241 |
+
"Objects": {
|
242 |
+
"precision": 0.8064171122994652,
|
243 |
+
"recall": 0.8860164512338425,
|
244 |
+
"f1-score": 0.8443449048152295,
|
245 |
+
"support": 851
|
246 |
+
},
|
247 |
+
"Occupations": {
|
248 |
+
"precision": 0.5763239875389408,
|
249 |
+
"recall": 0.7088122605363985,
|
250 |
+
"f1-score": 0.6357388316151202,
|
251 |
+
"support": 522
|
252 |
+
},
|
253 |
+
"Organizations": {
|
254 |
+
"precision": 0.7973811164713991,
|
255 |
+
"recall": 0.8845565749235474,
|
256 |
+
"f1-score": 0.8387096774193549,
|
257 |
+
"support": 1308
|
258 |
+
},
|
259 |
+
"Phenomena": {
|
260 |
+
"precision": 0.4477234401349072,
|
261 |
+
"recall": 0.46949602122015915,
|
262 |
+
"f1-score": 0.45835131635735865,
|
263 |
+
"support": 1131
|
264 |
+
},
|
265 |
+
"Physiology": {
|
266 |
+
"precision": 0.7840562521179262,
|
267 |
+
"recall": 0.7892717039058502,
|
268 |
+
"f1-score": 0.7866553336166596,
|
269 |
+
"support": 11726
|
270 |
+
},
|
271 |
+
"Procedures": {
|
272 |
+
"precision": 0.7731168893358233,
|
273 |
+
"recall": 0.801439563167039,
|
274 |
+
"f1-score": 0.7870234961489714,
|
275 |
+
"support": 20145
|
276 |
+
},
|
277 |
+
"macro avg": {
|
278 |
+
"precision": 0.78373224256973,
|
279 |
+
"recall": 0.7989067125105784,
|
280 |
+
"f1-score": 0.7897743053387318,
|
281 |
+
"support": 122183
|
282 |
+
},
|
283 |
+
"weighted avg": {
|
284 |
+
"precision": 0.8334626249065854,
|
285 |
+
"recall": 0.8204496533887693,
|
286 |
+
"f1-score": 0.8260050590679613,
|
287 |
+
"support": 122183
|
288 |
+
}
|
289 |
+
}
|
290 |
+
},
|
291 |
+
"val": {
|
292 |
+
"token_level": {
|
293 |
+
"O": {
|
294 |
+
"precision": 0.930847233100809,
|
295 |
+
"recall": 0.9466419326729286,
|
296 |
+
"f1-score": 0.9386781450775005,
|
297 |
+
"support": 187057.0
|
298 |
+
},
|
299 |
+
"B-Anatomy": {
|
300 |
+
"precision": 0.7610457516339869,
|
301 |
+
"recall": 0.7537545313309166,
|
302 |
+
"f1-score": 0.757382593989853,
|
303 |
+
"support": 3862.0
|
304 |
+
},
|
305 |
+
"I-Anatomy": {
|
306 |
+
"precision": 0.7202499289974439,
|
307 |
+
"recall": 0.7752980739834913,
|
308 |
+
"f1-score": 0.7467608951707891,
|
309 |
+
"support": 3271.0
|
310 |
+
},
|
311 |
+
"B-Chemicals & Drugs": {
|
312 |
+
"precision": 0.8210823909531503,
|
313 |
+
"recall": 0.823188014576866,
|
314 |
+
"f1-score": 0.8221338545528072,
|
315 |
+
"support": 7409.0
|
316 |
+
},
|
317 |
+
"I-Chemicals & Drugs": {
|
318 |
+
"precision": 0.8563915857605178,
|
319 |
+
"recall": 0.8465460361891433,
|
320 |
+
"f1-score": 0.8514403499069931,
|
321 |
+
"support": 10003.0
|
322 |
+
},
|
323 |
+
"B-Concepts & Ideas": {
|
324 |
+
"precision": 0.5897058823529412,
|
325 |
+
"recall": 0.3599640933572711,
|
326 |
+
"f1-score": 0.44704570791527315,
|
327 |
+
"support": 3342.0
|
328 |
+
},
|
329 |
+
"I-Concepts & Ideas": {
|
330 |
+
"precision": 0.5564516129032258,
|
331 |
+
"recall": 0.45887294364718234,
|
332 |
+
"f1-score": 0.5029733358910417,
|
333 |
+
"support": 2857.0
|
334 |
+
},
|
335 |
+
"B-Devices": {
|
336 |
+
"precision": 0.6964856230031949,
|
337 |
+
"recall": 0.44308943089430897,
|
338 |
+
"f1-score": 0.5416149068322982,
|
339 |
+
"support": 492.0
|
340 |
+
},
|
341 |
+
"I-Devices": {
|
342 |
+
"precision": 0.627677100494234,
|
343 |
+
"recall": 0.5537790697674418,
|
344 |
+
"f1-score": 0.5884169884169884,
|
345 |
+
"support": 688.0
|
346 |
+
},
|
347 |
+
"B-Disorders": {
|
348 |
+
"precision": 0.7699307347548554,
|
349 |
+
"recall": 0.6881524641903375,
|
350 |
+
"f1-score": 0.7267482853663226,
|
351 |
+
"support": 8238.0
|
352 |
+
},
|
353 |
+
"I-Disorders": {
|
354 |
+
"precision": 0.7285814116002796,
|
355 |
+
"recall": 0.7107990182710663,
|
356 |
+
"f1-score": 0.7195803713161709,
|
357 |
+
"support": 7334.0
|
358 |
+
},
|
359 |
+
"B-Genes & Molecular Sequences": {
|
360 |
+
"precision": 0.6427840327533265,
|
361 |
+
"recall": 0.6562173458725182,
|
362 |
+
"f1-score": 0.6494312306101344,
|
363 |
+
"support": 957.0
|
364 |
+
},
|
365 |
+
"I-Genes & Molecular Sequences": {
|
366 |
+
"precision": 0.6770573566084788,
|
367 |
+
"recall": 0.71026814911707,
|
368 |
+
"f1-score": 0.6932652409830833,
|
369 |
+
"support": 1529.0
|
370 |
+
},
|
371 |
+
"B-Geographic Areas": {
|
372 |
+
"precision": 0.7933042212518195,
|
373 |
+
"recall": 0.8086053412462908,
|
374 |
+
"f1-score": 0.8008817046289493,
|
375 |
+
"support": 674.0
|
376 |
+
},
|
377 |
+
"I-Geographic Areas": {
|
378 |
+
"precision": 0.7455197132616488,
|
379 |
+
"recall": 0.7675276752767528,
|
380 |
+
"f1-score": 0.7563636363636363,
|
381 |
+
"support": 542.0
|
382 |
+
},
|
383 |
+
"B-Living Beings": {
|
384 |
+
"precision": 0.79478672985782,
|
385 |
+
"recall": 0.8014336917562724,
|
386 |
+
"f1-score": 0.7980963712076146,
|
387 |
+
"support": 4185.0
|
388 |
+
},
|
389 |
+
"I-Living Beings": {
|
390 |
+
"precision": 0.8440483768300445,
|
391 |
+
"recall": 0.8202061855670103,
|
392 |
+
"f1-score": 0.8319564990065879,
|
393 |
+
"support": 4850.0
|
394 |
+
},
|
395 |
+
"B-Objects": {
|
396 |
+
"precision": 0.61,
|
397 |
+
"recall": 0.71484375,
|
398 |
+
"f1-score": 0.658273381294964,
|
399 |
+
"support": 256.0
|
400 |
+
},
|
401 |
+
"I-Objects": {
|
402 |
+
"precision": 0.6748971193415638,
|
403 |
+
"recall": 0.6721311475409836,
|
404 |
+
"f1-score": 0.6735112936344969,
|
405 |
+
"support": 244.0
|
406 |
+
},
|
407 |
+
"B-Occupations": {
|
408 |
+
"precision": 0.5208333333333334,
|
409 |
+
"recall": 0.5076142131979695,
|
410 |
+
"f1-score": 0.5141388174807198,
|
411 |
+
"support": 197.0
|
412 |
+
},
|
413 |
+
"I-Occupations": {
|
414 |
+
"precision": 0.47368421052631576,
|
415 |
+
"recall": 0.36416184971098264,
|
416 |
+
"f1-score": 0.4117647058823529,
|
417 |
+
"support": 173.0
|
418 |
+
},
|
419 |
+
"B-Organizations": {
|
420 |
+
"precision": 0.579476861167002,
|
421 |
+
"recall": 0.6501128668171557,
|
422 |
+
"f1-score": 0.6127659574468085,
|
423 |
+
"support": 443.0
|
424 |
+
},
|
425 |
+
"I-Organizations": {
|
426 |
+
"precision": 0.6396526772793053,
|
427 |
+
"recall": 0.7106109324758842,
|
428 |
+
"f1-score": 0.6732673267326733,
|
429 |
+
"support": 622.0
|
430 |
+
},
|
431 |
+
"B-Phenomena": {
|
432 |
+
"precision": 0.34057971014492755,
|
433 |
+
"recall": 0.34306569343065696,
|
434 |
+
"f1-score": 0.3418181818181818,
|
435 |
+
"support": 274.0
|
436 |
+
},
|
437 |
+
"I-Phenomena": {
|
438 |
+
"precision": 0.28451882845188287,
|
439 |
+
"recall": 0.32075471698113206,
|
440 |
+
"f1-score": 0.30155210643015523,
|
441 |
+
"support": 212.0
|
442 |
+
},
|
443 |
+
"B-Physiology": {
|
444 |
+
"precision": 0.6399317406143344,
|
445 |
+
"recall": 0.6038647342995169,
|
446 |
+
"f1-score": 0.6213753106876554,
|
447 |
+
"support": 3726.0
|
448 |
+
},
|
449 |
+
"I-Physiology": {
|
450 |
+
"precision": 0.5488801990757198,
|
451 |
+
"recall": 0.5744047619047619,
|
452 |
+
"f1-score": 0.5613524813670242,
|
453 |
+
"support": 2688.0
|
454 |
+
},
|
455 |
+
"B-Procedures": {
|
456 |
+
"precision": 0.6613690007867821,
|
457 |
+
"recall": 0.6512240471025721,
|
458 |
+
"f1-score": 0.6562573190725272,
|
459 |
+
"support": 6454.0
|
460 |
+
},
|
461 |
+
"I-Procedures": {
|
462 |
+
"precision": 0.6790007806401249,
|
463 |
+
"recall": 0.6622506471752703,
|
464 |
+
"f1-score": 0.6705211224175146,
|
465 |
+
"support": 6567.0
|
466 |
+
},
|
467 |
+
"accuracy": 0.8725375818329085,
|
468 |
+
"macro avg": {
|
469 |
+
"precision": 0.6623715223268644,
|
470 |
+
"recall": 0.6448063227018537,
|
471 |
+
"f1-score": 0.6506678662586591,
|
472 |
+
"support": 269146.0
|
473 |
+
},
|
474 |
+
"weighted avg": {
|
475 |
+
"precision": 0.8694071423336295,
|
476 |
+
"recall": 0.8725375818329085,
|
477 |
+
"f1-score": 0.8703500500944416,
|
478 |
+
"support": 269146.0
|
479 |
+
}
|
480 |
+
},
|
481 |
+
"span_level": {
|
482 |
+
"Anatomy": {
|
483 |
+
"precision": 0.6821275523065289,
|
484 |
+
"recall": 0.6972429786137594,
|
485 |
+
"f1-score": 0.6896024464831805,
|
486 |
+
"support": 3881
|
487 |
+
},
|
488 |
+
"Chemicals & Drugs": {
|
489 |
+
"precision": 0.7514910536779325,
|
490 |
+
"recall": 0.7570093457943925,
|
491 |
+
"f1-score": 0.7542401064183571,
|
492 |
+
"support": 7490
|
493 |
+
},
|
494 |
+
"Concepts & Ideas": {
|
495 |
+
"precision": 0.4873985476292183,
|
496 |
+
"recall": 0.33907875185735514,
|
497 |
+
"f1-score": 0.39992989835261133,
|
498 |
+
"support": 3365
|
499 |
+
},
|
500 |
+
"Devices": {
|
501 |
+
"precision": 0.6056338028169014,
|
502 |
+
"recall": 0.43610547667342797,
|
503 |
+
"f1-score": 0.5070754716981132,
|
504 |
+
"support": 493
|
505 |
+
},
|
506 |
+
"Disorders": {
|
507 |
+
"precision": 0.7004692387904067,
|
508 |
+
"recall": 0.6459134615384615,
|
509 |
+
"f1-score": 0.6720860430215108,
|
510 |
+
"support": 8320
|
511 |
+
},
|
512 |
+
"Genes & Molecular Sequences": {
|
513 |
+
"precision": 0.5463414634146342,
|
514 |
+
"recall": 0.5803108808290155,
|
515 |
+
"f1-score": 0.5628140703517589,
|
516 |
+
"support": 965
|
517 |
+
},
|
518 |
+
"Geographic Areas": {
|
519 |
+
"precision": 0.7474600870827286,
|
520 |
+
"recall": 0.7595870206489675,
|
521 |
+
"f1-score": 0.753474762253109,
|
522 |
+
"support": 678
|
523 |
+
},
|
524 |
+
"Living Beings": {
|
525 |
+
"precision": 0.7159887798036466,
|
526 |
+
"recall": 0.7244560075685903,
|
527 |
+
"f1-score": 0.7201975076416647,
|
528 |
+
"support": 4228
|
529 |
+
},
|
530 |
+
"Objects": {
|
531 |
+
"precision": 0.551948051948052,
|
532 |
+
"recall": 0.6563706563706564,
|
533 |
+
"f1-score": 0.599647266313933,
|
534 |
+
"support": 259
|
535 |
+
},
|
536 |
+
"Occupations": {
|
537 |
+
"precision": 0.46568627450980393,
|
538 |
+
"recall": 0.4797979797979798,
|
539 |
+
"f1-score": 0.472636815920398,
|
540 |
+
"support": 198
|
541 |
+
},
|
542 |
+
"Organizations": {
|
543 |
+
"precision": 0.5019157088122606,
|
544 |
+
"recall": 0.5783664459161147,
|
545 |
+
"f1-score": 0.5374358974358974,
|
546 |
+
"support": 453
|
547 |
+
},
|
548 |
+
"Phenomena": {
|
549 |
+
"precision": 0.27672955974842767,
|
550 |
+
"recall": 0.32116788321167883,
|
551 |
+
"f1-score": 0.29729729729729726,
|
552 |
+
"support": 274
|
553 |
+
},
|
554 |
+
"Physiology": {
|
555 |
+
"precision": 0.5590094836670179,
|
556 |
+
"recall": 0.5657158091175687,
|
557 |
+
"f1-score": 0.562342652709686,
|
558 |
+
"support": 3751
|
559 |
+
},
|
560 |
+
"Procedures": {
|
561 |
+
"precision": 0.5877219380078242,
|
562 |
+
"recall": 0.6000921800583807,
|
563 |
+
"f1-score": 0.5938426453819841,
|
564 |
+
"support": 6509
|
565 |
+
},
|
566 |
+
"macro avg": {
|
567 |
+
"precision": 0.5842801101582417,
|
568 |
+
"recall": 0.5815153484283105,
|
569 |
+
"f1-score": 0.5801873486628216,
|
570 |
+
"support": 40864
|
571 |
+
},
|
572 |
+
"weighted avg": {
|
573 |
+
"precision": 0.6500699757755491,
|
574 |
+
"recall": 0.6334915818324197,
|
575 |
+
"f1-score": 0.6401859322619969,
|
576 |
+
"support": 40864
|
577 |
+
}
|
578 |
+
}
|
579 |
+
},
|
580 |
+
"test": {
|
581 |
+
"token_level": {
|
582 |
+
"O": {
|
583 |
+
"precision": 0.9327288328920794,
|
584 |
+
"recall": 0.9450010602205259,
|
585 |
+
"f1-score": 0.9388248429279391,
|
586 |
+
"support": 188640.0
|
587 |
+
},
|
588 |
+
"B-Anatomy": {
|
589 |
+
"precision": 0.7327746741154563,
|
590 |
+
"recall": 0.7246777163904236,
|
591 |
+
"f1-score": 0.7287037037037037,
|
592 |
+
"support": 3258.0
|
593 |
+
},
|
594 |
+
"I-Anatomy": {
|
595 |
+
"precision": 0.7148125384142594,
|
596 |
+
"recall": 0.7663920922570017,
|
597 |
+
"f1-score": 0.7397042455080299,
|
598 |
+
"support": 3035.0
|
599 |
+
},
|
600 |
+
"B-Chemicals & Drugs": {
|
601 |
+
"precision": 0.8137902559867878,
|
602 |
+
"recall": 0.8030694010593508,
|
603 |
+
"f1-score": 0.8083942853236722,
|
604 |
+
"support": 7363.0
|
605 |
+
},
|
606 |
+
"I-Chemicals & Drugs": {
|
607 |
+
"precision": 0.8251571052098114,
|
608 |
+
"recall": 0.8428408737964592,
|
609 |
+
"f1-score": 0.8339052496798975,
|
610 |
+
"support": 9659.0
|
611 |
+
},
|
612 |
+
"B-Concepts & Ideas": {
|
613 |
+
"precision": 0.6245535714285714,
|
614 |
+
"recall": 0.3821360284075389,
|
615 |
+
"f1-score": 0.47415692255549907,
|
616 |
+
"support": 3661.0
|
617 |
+
},
|
618 |
+
"I-Concepts & Ideas": {
|
619 |
+
"precision": 0.5620985010706638,
|
620 |
+
"recall": 0.5108660395718456,
|
621 |
+
"f1-score": 0.5352591333899746,
|
622 |
+
"support": 3083.0
|
623 |
+
},
|
624 |
+
"B-Devices": {
|
625 |
+
"precision": 0.5829596412556054,
|
626 |
+
"recall": 0.3672316384180791,
|
627 |
+
"f1-score": 0.4506065857885615,
|
628 |
+
"support": 354.0
|
629 |
+
},
|
630 |
+
"I-Devices": {
|
631 |
+
"precision": 0.4653284671532847,
|
632 |
+
"recall": 0.450530035335689,
|
633 |
+
"f1-score": 0.4578096947935368,
|
634 |
+
"support": 566.0
|
635 |
+
},
|
636 |
+
"B-Disorders": {
|
637 |
+
"precision": 0.7646812665643744,
|
638 |
+
"recall": 0.6812476699391078,
|
639 |
+
"f1-score": 0.7205573080967402,
|
640 |
+
"support": 8047.0
|
641 |
+
},
|
642 |
+
"I-Disorders": {
|
643 |
+
"precision": 0.7384636639955788,
|
644 |
+
"recall": 0.7031969477700303,
|
645 |
+
"f1-score": 0.7203989487162208,
|
646 |
+
"support": 7601.0
|
647 |
+
},
|
648 |
+
"B-Genes & Molecular Sequences": {
|
649 |
+
"precision": 0.6082304526748972,
|
650 |
+
"recall": 0.6651665166516652,
|
651 |
+
"f1-score": 0.6354256233877902,
|
652 |
+
"support": 1111.0
|
653 |
+
},
|
654 |
+
"I-Genes & Molecular Sequences": {
|
655 |
+
"precision": 0.6074357572443958,
|
656 |
+
"recall": 0.6558441558441559,
|
657 |
+
"f1-score": 0.6307124609707635,
|
658 |
+
"support": 1694.0
|
659 |
+
},
|
660 |
+
"B-Geographic Areas": {
|
661 |
+
"precision": 0.7287878787878788,
|
662 |
+
"recall": 0.8097643097643098,
|
663 |
+
"f1-score": 0.7671451355661882,
|
664 |
+
"support": 594.0
|
665 |
+
},
|
666 |
+
"I-Geographic Areas": {
|
667 |
+
"precision": 0.720226843100189,
|
668 |
+
"recall": 0.6840215439856373,
|
669 |
+
"f1-score": 0.7016574585635359,
|
670 |
+
"support": 557.0
|
671 |
+
},
|
672 |
+
"B-Living Beings": {
|
673 |
+
"precision": 0.7869297163995068,
|
674 |
+
"recall": 0.8009538152610441,
|
675 |
+
"f1-score": 0.7938798358004727,
|
676 |
+
"support": 3984.0
|
677 |
+
},
|
678 |
+
"I-Living Beings": {
|
679 |
+
"precision": 0.8368229403732362,
|
680 |
+
"recall": 0.8145768719539211,
|
681 |
+
"f1-score": 0.8255500673551863,
|
682 |
+
"support": 4514.0
|
683 |
+
},
|
684 |
+
"B-Objects": {
|
685 |
+
"precision": 0.5885558583106267,
|
686 |
+
"recall": 0.6447761194029851,
|
687 |
+
"f1-score": 0.6153846153846154,
|
688 |
+
"support": 335.0
|
689 |
+
},
|
690 |
+
"I-Objects": {
|
691 |
+
"precision": 0.6981818181818182,
|
692 |
+
"recall": 0.6421404682274248,
|
693 |
+
"f1-score": 0.6689895470383276,
|
694 |
+
"support": 299.0
|
695 |
+
},
|
696 |
+
"B-Occupations": {
|
697 |
+
"precision": 0.44398340248962653,
|
698 |
+
"recall": 0.5487179487179488,
|
699 |
+
"f1-score": 0.4908256880733945,
|
700 |
+
"support": 195.0
|
701 |
+
},
|
702 |
+
"I-Occupations": {
|
703 |
+
"precision": 0.3793103448275862,
|
704 |
+
"recall": 0.5076923076923077,
|
705 |
+
"f1-score": 0.4342105263157895,
|
706 |
+
"support": 130.0
|
707 |
+
},
|
708 |
+
"B-Organizations": {
|
709 |
+
"precision": 0.5512820512820513,
|
710 |
+
"recall": 0.675392670157068,
|
711 |
+
"f1-score": 0.6070588235294118,
|
712 |
+
"support": 382.0
|
713 |
+
},
|
714 |
+
"I-Organizations": {
|
715 |
+
"precision": 0.5950653120464441,
|
716 |
+
"recall": 0.8023483365949119,
|
717 |
+
"f1-score": 0.6833333333333333,
|
718 |
+
"support": 511.0
|
719 |
+
},
|
720 |
+
"B-Phenomena": {
|
721 |
+
"precision": 0.2920962199312715,
|
722 |
+
"recall": 0.32075471698113206,
|
723 |
+
"f1-score": 0.3057553956834532,
|
724 |
+
"support": 265.0
|
725 |
+
},
|
726 |
+
"I-Phenomena": {
|
727 |
+
"precision": 0.2720306513409962,
|
728 |
+
"recall": 0.3397129186602871,
|
729 |
+
"f1-score": 0.3021276595744681,
|
730 |
+
"support": 209.0
|
731 |
+
},
|
732 |
+
"B-Physiology": {
|
733 |
+
"precision": 0.6418974499588703,
|
734 |
+
"recall": 0.6137912952281069,
|
735 |
+
"f1-score": 0.6275298217397132,
|
736 |
+
"support": 3814.0
|
737 |
+
},
|
738 |
+
"I-Physiology": {
|
739 |
+
"precision": 0.5521588402143083,
|
740 |
+
"recall": 0.6245989304812835,
|
741 |
+
"f1-score": 0.5861492137838742,
|
742 |
+
"support": 2805.0
|
743 |
+
},
|
744 |
+
"B-Procedures": {
|
745 |
+
"precision": 0.6753959542104437,
|
746 |
+
"recall": 0.6567551082647148,
|
747 |
+
"f1-score": 0.6659451101662157,
|
748 |
+
"support": 6558.0
|
749 |
+
},
|
750 |
+
"I-Procedures": {
|
751 |
+
"precision": 0.7067256367241116,
|
752 |
+
"recall": 0.6688799076212472,
|
753 |
+
"f1-score": 0.6872821653689284,
|
754 |
+
"support": 6928.0
|
755 |
+
},
|
756 |
+
"accuracy": 0.870661701560603,
|
757 |
+
"macro avg": {
|
758 |
+
"precision": 0.6359470912477494,
|
759 |
+
"recall": 0.6432095670571105,
|
760 |
+
"f1-score": 0.6357683931765253,
|
761 |
+
"support": 270152.0
|
762 |
+
},
|
763 |
+
"weighted avg": {
|
764 |
+
"precision": 0.8687298439825012,
|
765 |
+
"recall": 0.870661701560603,
|
766 |
+
"f1-score": 0.8690103283416164,
|
767 |
+
"support": 270152.0
|
768 |
+
}
|
769 |
+
},
|
770 |
+
"span_level": {
|
771 |
+
"Anatomy": {
|
772 |
+
"precision": 0.6564270802266627,
|
773 |
+
"recall": 0.67165090021361,
|
774 |
+
"f1-score": 0.6639517345399698,
|
775 |
+
"support": 3277
|
776 |
+
},
|
777 |
+
"Chemicals & Drugs": {
|
778 |
+
"precision": 0.748099891422367,
|
779 |
+
"recall": 0.745066234117329,
|
780 |
+
"f1-score": 0.7465799810375187,
|
781 |
+
"support": 7398
|
782 |
+
},
|
783 |
+
"Concepts & Ideas": {
|
784 |
+
"precision": 0.5149451381006432,
|
785 |
+
"recall": 0.36953570458865054,
|
786 |
+
"f1-score": 0.4302877015491622,
|
787 |
+
"support": 3683
|
788 |
+
},
|
789 |
+
"Devices": {
|
790 |
+
"precision": 0.44745762711864406,
|
791 |
+
"recall": 0.37183098591549296,
|
792 |
+
"f1-score": 0.40615384615384614,
|
793 |
+
"support": 355
|
794 |
+
},
|
795 |
+
"Disorders": {
|
796 |
+
"precision": 0.6907038512616201,
|
797 |
+
"recall": 0.6413861141941053,
|
798 |
+
"f1-score": 0.6651320416906452,
|
799 |
+
"support": 8109
|
800 |
+
},
|
801 |
+
"Genes & Molecular Sequences": {
|
802 |
+
"precision": 0.5060048038430744,
|
803 |
+
"recall": 0.5668161434977579,
|
804 |
+
"f1-score": 0.5346869712351946,
|
805 |
+
"support": 1115
|
806 |
+
},
|
807 |
+
"Geographic Areas": {
|
808 |
+
"precision": 0.6712328767123288,
|
809 |
+
"recall": 0.7374581939799331,
|
810 |
+
"f1-score": 0.7027888446215139,
|
811 |
+
"support": 598
|
812 |
+
},
|
813 |
+
"Living Beings": {
|
814 |
+
"precision": 0.7177242888402626,
|
815 |
+
"recall": 0.7391086629944917,
|
816 |
+
"f1-score": 0.7282595288022696,
|
817 |
+
"support": 3994
|
818 |
+
},
|
819 |
+
"Objects": {
|
820 |
+
"precision": 0.5180412371134021,
|
821 |
+
"recall": 0.5982142857142857,
|
822 |
+
"f1-score": 0.5552486187845304,
|
823 |
+
"support": 336
|
824 |
+
},
|
825 |
+
"Occupations": {
|
826 |
+
"precision": 0.3671875,
|
827 |
+
"recall": 0.47959183673469385,
|
828 |
+
"f1-score": 0.415929203539823,
|
829 |
+
"support": 196
|
830 |
+
},
|
831 |
+
"Organizations": {
|
832 |
+
"precision": 0.5041666666666667,
|
833 |
+
"recall": 0.6335078534031413,
|
834 |
+
"f1-score": 0.5614849187935034,
|
835 |
+
"support": 382
|
836 |
+
},
|
837 |
+
"Phenomena": {
|
838 |
+
"precision": 0.2056338028169014,
|
839 |
+
"recall": 0.27137546468401486,
|
840 |
+
"f1-score": 0.23397435897435898,
|
841 |
+
"support": 269
|
842 |
+
},
|
843 |
+
"Physiology": {
|
844 |
+
"precision": 0.5598194130925508,
|
845 |
+
"recall": 0.5823115053482911,
|
846 |
+
"f1-score": 0.570843989769821,
|
847 |
+
"support": 3833
|
848 |
+
},
|
849 |
+
"Procedures": {
|
850 |
+
"precision": 0.5965460771177609,
|
851 |
+
"recall": 0.607213214123352,
|
852 |
+
"f1-score": 0.6018323820967258,
|
853 |
+
"support": 6599
|
854 |
+
},
|
855 |
+
"macro avg": {
|
856 |
+
"precision": 0.5502850181666347,
|
857 |
+
"recall": 0.5725047928220821,
|
858 |
+
"f1-score": 0.558368151542063,
|
859 |
+
"support": 40144
|
860 |
+
},
|
861 |
+
"weighted avg": {
|
862 |
+
"precision": 0.6414502549457723,
|
863 |
+
"recall": 0.6297578716620167,
|
864 |
+
"f1-score": 0.6340080620691133,
|
865 |
+
"support": 40144
|
866 |
+
}
|
867 |
+
}
|
868 |
+
}
|
869 |
+
}
|
performance_report.md
ADDED
@@ -0,0 +1,180 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Performance on Training Set
|
2 |
+
|
3 |
+
## Span Level
|
4 |
+
|
5 |
+
| Label | Precision | Recall | F1-score | Support |
|
6 |
+
| --- | --- | --- | --- | --- |
|
7 |
+
| Anatomy | 0.870 | 0.872 | 0.871 | 11509 |
|
8 |
+
| Chemicals & Drugs | 0.902 | 0.889 | 0.895 | 22432 |
|
9 |
+
| Concepts & Ideas | 0.790 | 0.630 | 0.701 | 10273 |
|
10 |
+
| Devices | 0.805 | 0.791 | 0.798 | 1170 |
|
11 |
+
| Disorders | 0.847 | 0.817 | 0.832 | 24634 |
|
12 |
+
| Genes & Molecular Sequences | 0.809 | 0.851 | 0.829 | 3382 |
|
13 |
+
| Geographic Areas | 0.876 | 0.907 | 0.892 | 1712 |
|
14 |
+
| Living Beings | 0.888 | 0.888 | 0.888 | 11388 |
|
15 |
+
| Objects | 0.806 | 0.886 | 0.844 | 851 |
|
16 |
+
| Occupations | 0.576 | 0.709 | 0.636 | 522 |
|
17 |
+
| Organizations | 0.797 | 0.885 | 0.839 | 1308 |
|
18 |
+
| Phenomena | 0.448 | 0.469 | 0.458 | 1131 |
|
19 |
+
| Physiology | 0.784 | 0.789 | 0.787 | 11726 |
|
20 |
+
| Procedures | 0.773 | 0.801 | 0.787 | 20145 |
|
21 |
+
| macro avg | 0.784 | 0.799 | 0.790 | 122183 |
|
22 |
+
| weighted avg | 0.833 | 0.820 | 0.826 | 122183 |
|
23 |
+
|
24 |
+
## Token Level
|
25 |
+
|
26 |
+
| Label | Precision | Recall | F1-score | Support |
|
27 |
+
| --- | --- | --- | --- | --- |
|
28 |
+
| O | 0.972 | 0.976 | 0.974 | 557052 |
|
29 |
+
| B-Anatomy | 0.919 | 0.913 | 0.916 | 11456 |
|
30 |
+
| I-Anatomy | 0.917 | 0.945 | 0.931 | 9946 |
|
31 |
+
| B-Chemicals & Drugs | 0.943 | 0.934 | 0.938 | 22294 |
|
32 |
+
| I-Chemicals & Drugs | 0.963 | 0.955 | 0.959 | 31036 |
|
33 |
+
| B-Concepts & Ideas | 0.877 | 0.652 | 0.748 | 10226 |
|
34 |
+
| I-Concepts & Ideas | 0.884 | 0.856 | 0.869 | 8750 |
|
35 |
+
| B-Devices | 0.868 | 0.801 | 0.833 | 1163 |
|
36 |
+
| I-Devices | 0.850 | 0.886 | 0.868 | 1602 |
|
37 |
+
| B-Disorders | 0.900 | 0.850 | 0.875 | 24455 |
|
38 |
+
| I-Disorders | 0.897 | 0.907 | 0.902 | 22954 |
|
39 |
+
| B-Genes & Molecular Sequences | 0.868 | 0.905 | 0.886 | 3373 |
|
40 |
+
| I-Genes & Molecular Sequences | 0.891 | 0.935 | 0.913 | 5266 |
|
41 |
+
| B-Geographic Areas | 0.917 | 0.950 | 0.933 | 1702 |
|
42 |
+
| I-Geographic Areas | 0.937 | 0.934 | 0.936 | 1370 |
|
43 |
+
| B-Living Beings | 0.925 | 0.935 | 0.930 | 11285 |
|
44 |
+
| I-Living Beings | 0.958 | 0.963 | 0.960 | 12389 |
|
45 |
+
| B-Objects | 0.857 | 0.956 | 0.903 | 832 |
|
46 |
+
| I-Objects | 0.904 | 0.936 | 0.920 | 705 |
|
47 |
+
| B-Occupations | 0.649 | 0.759 | 0.699 | 518 |
|
48 |
+
| I-Occupations | 0.600 | 0.756 | 0.669 | 393 |
|
49 |
+
| B-Organizations | 0.837 | 0.916 | 0.874 | 1296 |
|
50 |
+
| I-Organizations | 0.857 | 0.958 | 0.904 | 1766 |
|
51 |
+
| B-Phenomena | 0.541 | 0.513 | 0.527 | 1124 |
|
52 |
+
| I-Phenomena | 0.582 | 0.486 | 0.530 | 989 |
|
53 |
+
| B-Physiology | 0.849 | 0.819 | 0.834 | 11666 |
|
54 |
+
| I-Physiology | 0.810 | 0.881 | 0.844 | 8875 |
|
55 |
+
| B-Procedures | 0.829 | 0.844 | 0.837 | 20022 |
|
56 |
+
| I-Procedures | 0.891 | 0.896 | 0.893 | 20889 |
|
57 |
+
| macro avg | 0.851 | 0.863 | 0.855 | 805394 |
|
58 |
+
| weighted avg | 0.949 | 0.949 | 0.949 | 805394 |
|
59 |
+
|
60 |
+
|
61 |
+
# Performance on Validation Set
|
62 |
+
|
63 |
+
## Span Level
|
64 |
+
|
65 |
+
| Label | Precision | Recall | F1-score | Support |
|
66 |
+
| --- | --- | --- | --- | --- |
|
67 |
+
| Anatomy | 0.682 | 0.697 | 0.690 | 3881 |
|
68 |
+
| Chemicals & Drugs | 0.751 | 0.757 | 0.754 | 7490 |
|
69 |
+
| Concepts & Ideas | 0.487 | 0.339 | 0.400 | 3365 |
|
70 |
+
| Devices | 0.606 | 0.436 | 0.507 | 493 |
|
71 |
+
| Disorders | 0.700 | 0.646 | 0.672 | 8320 |
|
72 |
+
| Genes & Molecular Sequences | 0.546 | 0.580 | 0.563 | 965 |
|
73 |
+
| Geographic Areas | 0.747 | 0.760 | 0.753 | 678 |
|
74 |
+
| Living Beings | 0.716 | 0.724 | 0.720 | 4228 |
|
75 |
+
| Objects | 0.552 | 0.656 | 0.600 | 259 |
|
76 |
+
| Occupations | 0.466 | 0.480 | 0.473 | 198 |
|
77 |
+
| Organizations | 0.502 | 0.578 | 0.537 | 453 |
|
78 |
+
| Phenomena | 0.277 | 0.321 | 0.297 | 274 |
|
79 |
+
| Physiology | 0.559 | 0.566 | 0.562 | 3751 |
|
80 |
+
| Procedures | 0.588 | 0.600 | 0.594 | 6509 |
|
81 |
+
| macro avg | 0.584 | 0.582 | 0.580 | 40864 |
|
82 |
+
| weighted avg | 0.650 | 0.633 | 0.640 | 40864 |
|
83 |
+
|
84 |
+
## Token Level
|
85 |
+
|
86 |
+
| Label | Precision | Recall | F1-score | Support |
|
87 |
+
| --- | --- | --- | --- | --- |
|
88 |
+
| O | 0.931 | 0.947 | 0.939 | 187057 |
|
89 |
+
| B-Anatomy | 0.761 | 0.754 | 0.757 | 3862 |
|
90 |
+
| I-Anatomy | 0.720 | 0.775 | 0.747 | 3271 |
|
91 |
+
| B-Chemicals & Drugs | 0.821 | 0.823 | 0.822 | 7409 |
|
92 |
+
| I-Chemicals & Drugs | 0.856 | 0.847 | 0.851 | 10003 |
|
93 |
+
| B-Concepts & Ideas | 0.590 | 0.360 | 0.447 | 3342 |
|
94 |
+
| I-Concepts & Ideas | 0.556 | 0.459 | 0.503 | 2857 |
|
95 |
+
| B-Devices | 0.696 | 0.443 | 0.542 | 492 |
|
96 |
+
| I-Devices | 0.628 | 0.554 | 0.588 | 688 |
|
97 |
+
| B-Disorders | 0.770 | 0.688 | 0.727 | 8238 |
|
98 |
+
| I-Disorders | 0.729 | 0.711 | 0.720 | 7334 |
|
99 |
+
| B-Genes & Molecular Sequences | 0.643 | 0.656 | 0.649 | 957 |
|
100 |
+
| I-Genes & Molecular Sequences | 0.677 | 0.710 | 0.693 | 1529 |
|
101 |
+
| B-Geographic Areas | 0.793 | 0.809 | 0.801 | 674 |
|
102 |
+
| I-Geographic Areas | 0.746 | 0.768 | 0.756 | 542 |
|
103 |
+
| B-Living Beings | 0.795 | 0.801 | 0.798 | 4185 |
|
104 |
+
| I-Living Beings | 0.844 | 0.820 | 0.832 | 4850 |
|
105 |
+
| B-Objects | 0.610 | 0.715 | 0.658 | 256 |
|
106 |
+
| I-Objects | 0.675 | 0.672 | 0.674 | 244 |
|
107 |
+
| B-Occupations | 0.521 | 0.508 | 0.514 | 197 |
|
108 |
+
| I-Occupations | 0.474 | 0.364 | 0.412 | 173 |
|
109 |
+
| B-Organizations | 0.579 | 0.650 | 0.613 | 443 |
|
110 |
+
| I-Organizations | 0.640 | 0.711 | 0.673 | 622 |
|
111 |
+
| B-Phenomena | 0.341 | 0.343 | 0.342 | 274 |
|
112 |
+
| I-Phenomena | 0.285 | 0.321 | 0.302 | 212 |
|
113 |
+
| B-Physiology | 0.640 | 0.604 | 0.621 | 3726 |
|
114 |
+
| I-Physiology | 0.549 | 0.574 | 0.561 | 2688 |
|
115 |
+
| B-Procedures | 0.661 | 0.651 | 0.656 | 6454 |
|
116 |
+
| I-Procedures | 0.679 | 0.662 | 0.671 | 6567 |
|
117 |
+
| macro avg | 0.662 | 0.645 | 0.651 | 269146 |
|
118 |
+
| weighted avg | 0.869 | 0.873 | 0.870 | 269146 |
|
119 |
+
|
120 |
+
|
121 |
+
# Performance on Testing Set
|
122 |
+
|
123 |
+
## Span Level
|
124 |
+
|
125 |
+
| Label | Precision | Recall | F1-score | Support |
|
126 |
+
| --- | --- | --- | --- | --- |
|
127 |
+
| Anatomy | 0.656 | 0.672 | 0.664 | 3277 |
|
128 |
+
| Chemicals & Drugs | 0.748 | 0.745 | 0.747 | 7398 |
|
129 |
+
| Concepts & Ideas | 0.515 | 0.370 | 0.430 | 3683 |
|
130 |
+
| Devices | 0.447 | 0.372 | 0.406 | 355 |
|
131 |
+
| Disorders | 0.691 | 0.641 | 0.665 | 8109 |
|
132 |
+
| Genes & Molecular Sequences | 0.506 | 0.567 | 0.535 | 1115 |
|
133 |
+
| Geographic Areas | 0.671 | 0.737 | 0.703 | 598 |
|
134 |
+
| Living Beings | 0.718 | 0.739 | 0.728 | 3994 |
|
135 |
+
| Objects | 0.518 | 0.598 | 0.555 | 336 |
|
136 |
+
| Occupations | 0.367 | 0.480 | 0.416 | 196 |
|
137 |
+
| Organizations | 0.504 | 0.634 | 0.561 | 382 |
|
138 |
+
| Phenomena | 0.206 | 0.271 | 0.234 | 269 |
|
139 |
+
| Physiology | 0.560 | 0.582 | 0.571 | 3833 |
|
140 |
+
| Procedures | 0.597 | 0.607 | 0.602 | 6599 |
|
141 |
+
| macro avg | 0.550 | 0.573 | 0.558 | 40144 |
|
142 |
+
| weighted avg | 0.641 | 0.630 | 0.634 | 40144 |
|
143 |
+
|
144 |
+
## Token Level
|
145 |
+
|
146 |
+
| Label | Precision | Recall | F1-score | Support |
|
147 |
+
| --- | --- | --- | --- | --- |
|
148 |
+
| O | 0.933 | 0.945 | 0.939 | 188640 |
|
149 |
+
| B-Anatomy | 0.733 | 0.725 | 0.729 | 3258 |
|
150 |
+
| I-Anatomy | 0.715 | 0.766 | 0.740 | 3035 |
|
151 |
+
| B-Chemicals & Drugs | 0.814 | 0.803 | 0.808 | 7363 |
|
152 |
+
| I-Chemicals & Drugs | 0.825 | 0.843 | 0.834 | 9659 |
|
153 |
+
| B-Concepts & Ideas | 0.625 | 0.382 | 0.474 | 3661 |
|
154 |
+
| I-Concepts & Ideas | 0.562 | 0.511 | 0.535 | 3083 |
|
155 |
+
| B-Devices | 0.583 | 0.367 | 0.451 | 354 |
|
156 |
+
| I-Devices | 0.465 | 0.451 | 0.458 | 566 |
|
157 |
+
| B-Disorders | 0.765 | 0.681 | 0.721 | 8047 |
|
158 |
+
| I-Disorders | 0.738 | 0.703 | 0.720 | 7601 |
|
159 |
+
| B-Genes & Molecular Sequences | 0.608 | 0.665 | 0.635 | 1111 |
|
160 |
+
| I-Genes & Molecular Sequences | 0.607 | 0.656 | 0.631 | 1694 |
|
161 |
+
| B-Geographic Areas | 0.729 | 0.810 | 0.767 | 594 |
|
162 |
+
| I-Geographic Areas | 0.720 | 0.684 | 0.702 | 557 |
|
163 |
+
| B-Living Beings | 0.787 | 0.801 | 0.794 | 3984 |
|
164 |
+
| I-Living Beings | 0.837 | 0.815 | 0.826 | 4514 |
|
165 |
+
| B-Objects | 0.589 | 0.645 | 0.615 | 335 |
|
166 |
+
| I-Objects | 0.698 | 0.642 | 0.669 | 299 |
|
167 |
+
| B-Occupations | 0.444 | 0.549 | 0.491 | 195 |
|
168 |
+
| I-Occupations | 0.379 | 0.508 | 0.434 | 130 |
|
169 |
+
| B-Organizations | 0.551 | 0.675 | 0.607 | 382 |
|
170 |
+
| I-Organizations | 0.595 | 0.802 | 0.683 | 511 |
|
171 |
+
| B-Phenomena | 0.292 | 0.321 | 0.306 | 265 |
|
172 |
+
| I-Phenomena | 0.272 | 0.340 | 0.302 | 209 |
|
173 |
+
| B-Physiology | 0.642 | 0.614 | 0.628 | 3814 |
|
174 |
+
| I-Physiology | 0.552 | 0.625 | 0.586 | 2805 |
|
175 |
+
| B-Procedures | 0.675 | 0.657 | 0.666 | 6558 |
|
176 |
+
| I-Procedures | 0.707 | 0.669 | 0.687 | 6928 |
|
177 |
+
| macro avg | 0.636 | 0.643 | 0.636 | 270152 |
|
178 |
+
| weighted avg | 0.869 | 0.871 | 0.869 | 270152 |
|
179 |
+
|
180 |
+
|
rng_state.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9eb0be8c505b7f7bc80024c0457befd900a96d3f1eb3524224a115b11dd2ad72
|
3 |
+
size 14244
|
special_tokens_map.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cls_token": "[CLS]",
|
3 |
+
"mask_token": "[MASK]",
|
4 |
+
"pad_token": "[PAD]",
|
5 |
+
"sep_token": "[SEP]",
|
6 |
+
"unk_token": "[UNK]"
|
7 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"added_tokens_decoder": {
|
3 |
+
"0": {
|
4 |
+
"content": "[PAD]",
|
5 |
+
"lstrip": false,
|
6 |
+
"normalized": false,
|
7 |
+
"rstrip": false,
|
8 |
+
"single_word": false,
|
9 |
+
"special": true
|
10 |
+
},
|
11 |
+
"1": {
|
12 |
+
"content": "[UNK]",
|
13 |
+
"lstrip": false,
|
14 |
+
"normalized": false,
|
15 |
+
"rstrip": false,
|
16 |
+
"single_word": false,
|
17 |
+
"special": true
|
18 |
+
},
|
19 |
+
"2": {
|
20 |
+
"content": "[CLS]",
|
21 |
+
"lstrip": false,
|
22 |
+
"normalized": false,
|
23 |
+
"rstrip": false,
|
24 |
+
"single_word": false,
|
25 |
+
"special": true
|
26 |
+
},
|
27 |
+
"3": {
|
28 |
+
"content": "[SEP]",
|
29 |
+
"lstrip": false,
|
30 |
+
"normalized": false,
|
31 |
+
"rstrip": false,
|
32 |
+
"single_word": false,
|
33 |
+
"special": true
|
34 |
+
},
|
35 |
+
"4": {
|
36 |
+
"content": "[MASK]",
|
37 |
+
"lstrip": false,
|
38 |
+
"normalized": false,
|
39 |
+
"rstrip": false,
|
40 |
+
"single_word": false,
|
41 |
+
"special": true
|
42 |
+
}
|
43 |
+
},
|
44 |
+
"clean_up_tokenization_spaces": true,
|
45 |
+
"cls_token": "[CLS]",
|
46 |
+
"do_basic_tokenize": true,
|
47 |
+
"do_lower_case": true,
|
48 |
+
"extra_special_tokens": {},
|
49 |
+
"mask_token": "[MASK]",
|
50 |
+
"model_max_length": 512,
|
51 |
+
"never_split": null,
|
52 |
+
"pad_token": "[PAD]",
|
53 |
+
"sep_token": "[SEP]",
|
54 |
+
"strip_accents": null,
|
55 |
+
"tokenize_chinese_chars": true,
|
56 |
+
"tokenizer_class": "BertTokenizer",
|
57 |
+
"unk_token": "[UNK]"
|
58 |
+
}
|
trainer_state.json
ADDED
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"best_metric": 0.6506678662586591,
|
3 |
+
"best_model_checkpoint": "tmp_ner_fantastic-bale-19_38/run-25/checkpoint-660",
|
4 |
+
"epoch": 4.0,
|
5 |
+
"eval_steps": 500,
|
6 |
+
"global_step": 660,
|
7 |
+
"is_hyper_param_search": true,
|
8 |
+
"is_local_process_zero": true,
|
9 |
+
"is_world_process_zero": true,
|
10 |
+
"log_history": [
|
11 |
+
{
|
12 |
+
"epoch": 1.0,
|
13 |
+
"eval_accuracy": 0.8553461689937804,
|
14 |
+
"eval_loss": 0.47332701086997986,
|
15 |
+
"eval_macro_f1": 0.5179058864521847,
|
16 |
+
"eval_macro_precision": 0.5969084028407209,
|
17 |
+
"eval_macro_recall": 0.5063081060851987,
|
18 |
+
"eval_runtime": 5.987,
|
19 |
+
"eval_samples_per_second": 146.651,
|
20 |
+
"eval_steps_per_second": 18.373,
|
21 |
+
"step": 165
|
22 |
+
},
|
23 |
+
{
|
24 |
+
"epoch": 2.0,
|
25 |
+
"eval_accuracy": 0.8667526175384364,
|
26 |
+
"eval_loss": 0.423623263835907,
|
27 |
+
"eval_macro_f1": 0.6126875404495669,
|
28 |
+
"eval_macro_precision": 0.6399554829251698,
|
29 |
+
"eval_macro_recall": 0.6102712004848894,
|
30 |
+
"eval_runtime": 5.9737,
|
31 |
+
"eval_samples_per_second": 146.978,
|
32 |
+
"eval_steps_per_second": 18.414,
|
33 |
+
"step": 330
|
34 |
+
},
|
35 |
+
{
|
36 |
+
"epoch": 3.0,
|
37 |
+
"eval_accuracy": 0.8703603248794335,
|
38 |
+
"eval_loss": 0.4243398904800415,
|
39 |
+
"eval_macro_f1": 0.6335065301362576,
|
40 |
+
"eval_macro_precision": 0.6801343369395938,
|
41 |
+
"eval_macro_recall": 0.6289027870007556,
|
42 |
+
"eval_runtime": 5.9773,
|
43 |
+
"eval_samples_per_second": 146.888,
|
44 |
+
"eval_steps_per_second": 18.403,
|
45 |
+
"step": 495
|
46 |
+
},
|
47 |
+
{
|
48 |
+
"epoch": 3.0303030303030303,
|
49 |
+
"grad_norm": 0.778854250907898,
|
50 |
+
"learning_rate": 9.035786517978707e-05,
|
51 |
+
"loss": 0.5906,
|
52 |
+
"step": 500
|
53 |
+
},
|
54 |
+
{
|
55 |
+
"epoch": 4.0,
|
56 |
+
"eval_accuracy": 0.8725375818329085,
|
57 |
+
"eval_loss": 0.43925729393959045,
|
58 |
+
"eval_macro_f1": 0.6506678662586591,
|
59 |
+
"eval_macro_precision": 0.6623715223268644,
|
60 |
+
"eval_macro_recall": 0.6448063227018537,
|
61 |
+
"eval_runtime": 5.9602,
|
62 |
+
"eval_samples_per_second": 147.311,
|
63 |
+
"eval_steps_per_second": 18.456,
|
64 |
+
"step": 660
|
65 |
+
}
|
66 |
+
],
|
67 |
+
"logging_steps": 500,
|
68 |
+
"max_steps": 5280,
|
69 |
+
"num_input_tokens_seen": 0,
|
70 |
+
"num_train_epochs": 32,
|
71 |
+
"save_steps": 500,
|
72 |
+
"stateful_callbacks": {
|
73 |
+
"EarlyStoppingCallback": {
|
74 |
+
"args": {
|
75 |
+
"early_stopping_patience": 3,
|
76 |
+
"early_stopping_threshold": 0.001
|
77 |
+
},
|
78 |
+
"attributes": {
|
79 |
+
"early_stopping_patience_counter": 0
|
80 |
+
}
|
81 |
+
},
|
82 |
+
"TrainerControl": {
|
83 |
+
"args": {
|
84 |
+
"should_epoch_stop": false,
|
85 |
+
"should_evaluate": false,
|
86 |
+
"should_log": false,
|
87 |
+
"should_save": true,
|
88 |
+
"should_training_stop": false
|
89 |
+
},
|
90 |
+
"attributes": {}
|
91 |
+
}
|
92 |
+
},
|
93 |
+
"total_flos": 3213359444608236.0,
|
94 |
+
"train_batch_size": 16,
|
95 |
+
"trial_name": null,
|
96 |
+
"trial_params": {
|
97 |
+
"learning_rate": 9.767344966191627e-05,
|
98 |
+
"per_device_train_batch_size": 16,
|
99 |
+
"warmup_ratio": 0.021367464793327073,
|
100 |
+
"weight_decay": 0.025286446963170207
|
101 |
+
}
|
102 |
+
}
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|