Spanish
richardjonker2000 commited on
Commit
a68b4ec
·
verified ·
1 Parent(s): e99bf7b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -35
README.md CHANGED
@@ -8,34 +8,33 @@ metrics:
8
  - f1
9
  ---
10
 
11
- # Model Card for Model ID
12
 
13
- <!-- Provide a quick summary of what the model is/does. -->
14
- Our model focuses on Biomedical Named Entity Recognition (NER) in Spanish clinical texts, crucial for automated information extraction in medical research and treatment improvements.
15
- It proposes a novel approach using a Multi-Head Conditional Random Field (CRF) classifier to tackle multi-class NER tasks, overcoming challenges of overlapping entity instances.
16
- Classes: symptoms, procedures, diseases, chemicals, and proteins
17
 
 
18
 
19
  ## Model Details
20
 
21
  ### Model Description
22
 
23
- <!-- Provide a longer summary of what this model is. -->
24
-
25
-
26
  - **Developed by:** IEETA
27
- - **Shared by [optional]:** IEETA
28
  - **Model type:** Multi-Head-CRF, Roberta Base
29
  - **Language(s) (NLP):** Spanish
30
  - **License:** MIT
31
- - **Finetuned from model [optional]:** lcampillos/roberta-es-clinical-trials-ner
32
 
33
- ### Model Sources [optional]
34
 
35
- <!-- Provide the basic links for the model. -->
 
 
 
 
 
 
 
36
 
37
- - **Repository:** https://github.com/ieeta-pt/Multi-Head-CR
38
- - **Paper:** [More Information Needed]
39
 
40
  ## Uses
41
 
@@ -43,37 +42,35 @@ Note we do not take any liability for the use of the model in any professional/m
43
 
44
  ## How to Get Started with the Model
45
 
46
- Please refer to our GitHub repository for more information on how to train the model and run inference. https://github.com/ieeta-pt/Multi-Head-CRF
47
 
48
  ## Training Details
49
 
50
  ### Training Data
51
 
52
  The training data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
 
 
 
 
 
53
 
54
- [More Information Needed]
55
-
56
-
57
-
58
-
59
- ### Speeds, Sizes, Times [optional]
60
-
61
- The models were trained using an Nvidia Quadra RTX 8000. The models for 5 classes took approximately 1 hour to train and occupies around 1gb of disk space. Further this model shows linear complexity (+8 minutes) per entity class to classify.
62
 
 
63
 
64
  ### Testing Data, Factors & Metrics
65
 
66
  #### Testing Data
67
  The testing data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
68
 
69
-
70
  #### Metrics
71
 
72
  The models were evaluated using the F1 score metric, the standard for entity recognition tasks.
73
 
74
  ### Results
75
 
76
- We provide 4 seperate models with various hyperparmeter changes:
77
 
78
  | HLs per head | Augmentation | Percentage Tags | Augmentation Probability | F1 |
79
  |--------------|--------------|-----------------|--------------------------|--------|
@@ -84,17 +81,9 @@ We provide 4 seperate models with various hyperparmeter changes:
84
 
85
  All models are trained with a context size of 32 for 60 epochs.
86
 
87
- #### Summary
88
-
89
-
90
- ## Citation [optional]
91
 
 
92
 
93
  **BibTeX:**
94
 
95
- [More Information Needed]
96
-
97
-
98
-
99
-
100
-
 
8
  - f1
9
  ---
10
 
11
+ # Model Card for Biomedical Named Entity Recognition in Spanish Clinical Texts
12
 
13
+ Our model focuses on Biomedical Named Entity Recognition (NER) in Spanish clinical texts, crucial for automated information extraction in medical research and treatment improvements. It proposes a novel approach using a Multi-Head Conditional Random Field (CRF) classifier to tackle multi-class NER tasks, overcoming challenges of overlapping entity instances. The classes it recognizes include symptoms, procedures, diseases, chemicals, and proteins.
 
 
 
14
 
15
+ We provide 4 different, models, available as branches of this repository.
16
 
17
  ## Model Details
18
 
19
  ### Model Description
20
 
 
 
 
21
  - **Developed by:** IEETA
 
22
  - **Model type:** Multi-Head-CRF, Roberta Base
23
  - **Language(s) (NLP):** Spanish
24
  - **License:** MIT
25
+ - **Finetuned from model:** lcampillos/roberta-es-clinical-trials-ner
26
 
27
+ ### Model Sources
28
 
29
+ - **Repository:** [IEETA Multi-Head-CRF GitHub](https://github.com/ieeta-pt/Multi-Head-CRF)
30
+ - **Paper:** Multi-head CRF classifier for biomedical multi-class Named Entity Recognition on Spanish clinical notes [Awaiting Publication]
31
+ *Authors:*
32
+ - Richard A A Jonker ([ORCID: 0000-0002-3806-6940](https://orcid.org/0000-0002-3806-6940))
33
+ - Tiago Almeida ([ORCID: 0000-0002-4258-3350](https://orcid.org/0000-0002-4258-3350))
34
+ - Rui Antunes ([ORCID: 0000-0003-3533-8872](https://orcid.org/0000-0003-3533-8872))
35
+ - João R Almeida ([ORCID: 0000-0003-0729-2264](https://orcid.org/0000-0003-0729-2264))
36
+ - Sérgio Matos ([ORCID: 0000-0003-1941-3983](https://orcid.org/0000-0003-1941-3983))
37
 
 
 
38
 
39
  ## Uses
40
 
 
42
 
43
  ## How to Get Started with the Model
44
 
45
+ Please refer to our GitHub repository for more information on how to train the model and run inference: [IEETA Multi-Head-CRF GitHub](https://github.com/ieeta-pt/Multi-Head-CRF)
46
 
47
  ## Training Details
48
 
49
  ### Training Data
50
 
51
  The training data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
52
+ The dataset used consists of 4 seperate datasets:
53
+ - [MedProcNer](https://zenodo.org/records/8224056)
54
+ - [DisTEMIST](https://zenodo.org/records/7614764)
55
+ - [PharmaCoNER](https://zenodo.org/records/4270158)
56
+ - [SympTEMIST](https://zenodo.org/records/10635215)
57
 
58
+ ### Speeds, Sizes, Times
 
 
 
 
 
 
 
59
 
60
+ The models were trained using an Nvidia Quadra RTX 8000. The models for 5 classes took approximately 1 hour to train and occupy around 1GB of disk space. Additionally, this model shows linear complexity (+8 minutes) per entity class to classify.
61
 
62
  ### Testing Data, Factors & Metrics
63
 
64
  #### Testing Data
65
  The testing data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
66
 
 
67
  #### Metrics
68
 
69
  The models were evaluated using the F1 score metric, the standard for entity recognition tasks.
70
 
71
  ### Results
72
 
73
+ We provide 4 separate models with various hyperparameter changes:
74
 
75
  | HLs per head | Augmentation | Percentage Tags | Augmentation Probability | F1 |
76
  |--------------|--------------|-----------------|--------------------------|--------|
 
81
 
82
  All models are trained with a context size of 32 for 60 epochs.
83
 
 
 
 
 
84
 
85
+ ## Citation
86
 
87
  **BibTeX:**
88
 
89
+ [Awaiting Publication]