Model Description
This model is a fine-tuned version of BioBERT on the GENIA dataset for Dependency Parsing. We adapt the research Viable Dependency Parsing as Sequence Labeling to create labels and use this as ground truth.
Intended Use
This model is intended for analyze the syntactic structure of biomedical text, identifying relationships between words to elucidate the grammatical framework of sentences.
Training Data
This model was trained on the GENIA dataset. The dataset is in CoNLL form and it has been splited into train, validation, test.
Dataset | Number of sentence |
---|---|
Train | 14326 |
Dev | 1361 |
Test | 1360 |
Training method
- We collected the data and convert it to trainable dataset by adapting the code from the research
Dep2Label
. We have to thank this repository for showing us how to useDep2Label
. - We use
rel-pos
for encoding the label. The number of label generated for that encoder is 1538. - The pre-trained model we use is BioBERT from DMIS-Lab, which is suitable for the domain. The
.safetensor
version is used, provided by HuggingFace staff in the pull request. - Since we consider the label created with
Dep2Label
is ground truth, we decide to freeze the embedding layer and classifier layer when training to preserved the pre-trained model's knowledge and prevent overfitting.
Result
We trained and evaluated on Google Colab T4 GPU with 5 epochs. Here are the results on the test dataset we collected.
metric | dev | test |
---|---|---|
f1 | 83.3 | 79.1 |
precision | 83.2 | 79.0 |
recall | 83.4 | 79.3 |
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for almo762/biobert-dependency-parsing-v1.3
Base model
dmis-lab/biobert-base-cased-v1.2