jakelever commited on
Commit
2320c48
·
verified ·
1 Parent(s): 272cb7c

Change example and fix preprocessing script link

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -26,10 +26,12 @@ ner_pipeline = pipeline("token-classification",
26
  aggregation_strategy="max")
27
 
28
  # Apply it to some text
29
- ner_pipeline("EGFR T790M mutations have been known to affect treatment outcomes for NSCLC patients receiving erlotinib.")
30
 
31
  # Output:
32
- # [ {"entity_group": "FamilyName", "score": 0.44405, "word": "egfr", "start": 0, "end": 4},
 
 
33
  ```
34
 
35
  ## Dataset Info
@@ -38,7 +40,7 @@ ner_pipeline("EGFR T790M mutations have been known to affect treatment outcomes
38
 
39
  The dataset should be cited with: Wei, Chih-Hsuan, Hung-Yu Kao, and Zhiyong Lu. "GNormPlus: an integrative approach for tagging genes, gene families, and protein domains." BioMed research international 2015.1 (2015): 918710. DOI: [10.1155/2015/918710](https://doi.org/10.1155/2015/918710)
40
 
41
- **Preprocessing:** The training set was split 75/25 to create a training and validation set. No changes were made to the annotations. The preprocessing script for this dataset is [prepare_bc5cdr.py](https://github.com/Glasgow-AI4BioMed/bioner/blob/main/prepare_bc5cdr.py).
42
 
43
  ## Performance
44
 
 
26
  aggregation_strategy="max")
27
 
28
  # Apply it to some text
29
+ ner_pipeline("ZNF598 is a Zinc finger containing E3 ubiquitin ligase.")
30
 
31
  # Output:
32
+ # [ {"entity_group": "Gene", "score": 0.99889, "word": "znf598", "start": 0, "end": 6},
33
+ # {"entity_group": "DomainMotif", "score": 0.74961, "word": "zinc finger", "start": 12, "end": 23},
34
+ # {"entity_group": "FamilyName", "score": 0.89084, "word": "e3 ubiquitin ligase", "start": 35, "end": 54} ]
35
  ```
36
 
37
  ## Dataset Info
 
40
 
41
  The dataset should be cited with: Wei, Chih-Hsuan, Hung-Yu Kao, and Zhiyong Lu. "GNormPlus: an integrative approach for tagging genes, gene families, and protein domains." BioMed research international 2015.1 (2015): 918710. DOI: [10.1155/2015/918710](https://doi.org/10.1155/2015/918710)
42
 
43
+ **Preprocessing:** The training set was split 75/25 to create a training and validation set. No changes were made to the annotations. The preprocessing script for this dataset is [prepare_gnormplus.py](https://github.com/Glasgow-AI4BioMed/bioner/blob/main/prepare_gnormplus.py).
44
 
45
  ## Performance
46