Glasgow-AI4BioMed
/

bioner_gnormplus

@@ -26,10 +26,12 @@ ner_pipeline = pipeline("token-classification",
                         aggregation_strategy="max")
 # Apply it to some text
-ner_pipeline("EGFR T790M mutations have been known to affect treatment outcomes for NSCLC patients receiving erlotinib.")
 # Output:
-# [ {"entity_group": "FamilyName", "score": 0.44405, "word": "egfr", "start": 0, "end": 4},
 ```
 ## Dataset Info
@@ -38,7 +40,7 @@ ner_pipeline("EGFR T790M mutations have been known to affect treatment outcomes
 The dataset should be cited with: Wei, Chih-Hsuan, Hung-Yu Kao, and Zhiyong Lu. "GNormPlus: an integrative approach for tagging genes, gene families, and protein domains." BioMed research international 2015.1 (2015): 918710. DOI: [10.1155/2015/918710](https://doi.org/10.1155/2015/918710)
-**Preprocessing:** The training set was split 75/25 to create a training and validation set. No changes were made to the annotations. The preprocessing script for this dataset is [prepare_bc5cdr.py](https://github.com/Glasgow-AI4BioMed/bioner/blob/main/prepare_bc5cdr.py).
 ## Performance

                         aggregation_strategy="max")
 # Apply it to some text
+ner_pipeline("ZNF598 is a Zinc finger containing E3 ubiquitin ligase.")
 # Output:
+# [ {"entity_group": "Gene", "score": 0.99889, "word": "znf598", "start": 0, "end": 6},
+#   {"entity_group": "DomainMotif", "score": 0.74961, "word": "zinc finger", "start": 12, "end": 23},
+#   {"entity_group": "FamilyName", "score": 0.89084, "word": "e3 ubiquitin ligase", "start": 35, "end": 54} ]
 ```
 ## Dataset Info
 The dataset should be cited with: Wei, Chih-Hsuan, Hung-Yu Kao, and Zhiyong Lu. "GNormPlus: an integrative approach for tagging genes, gene families, and protein domains." BioMed research international 2015.1 (2015): 918710. DOI: [10.1155/2015/918710](https://doi.org/10.1155/2015/918710)
+**Preprocessing:** The training set was split 75/25 to create a training and validation set. No changes were made to the annotations. The preprocessing script for this dataset is [prepare_gnormplus.py](https://github.com/Glasgow-AI4BioMed/bioner/blob/main/prepare_gnormplus.py).
 ## Performance