Model Description

This model is a fine-tuned version of BioBERT on the GENIA dataset for Dependency Parsing. We adapt the research Viable Dependency Parsing as Sequence Labeling to create labels and use this as ground truth.

Intended Use

This model is intended for analyze the syntactic structure of biomedical text, identifying relationships between words to elucidate the grammatical framework of sentences.

Training Data

This model was trained on the CRAFT dataset. The dataset is in CoNLL form and it has not been splited into train, validation, test. As such, we randomize the dataset and split it accordingly with the ratio (0.8, 0.1, 0.1).

Dataset Number of sentence
Train 24663
Dev 3082
Test 3082

Training method

  • We collected the data and convert it to trainable dataset by adapting the code from the research Dep2Label. We have to thank this repository for showing us how to use Dep2Label.
  • We use rel-pos for encoding the label. The number of label generated for that encoder is 1538.
  • The pre-trained model we use is BioBERT from DMIS-Lab, which is suitable for the domain. The .safetensor version is used, provided by HuggingFace staff in the pull request.
  • We decide to freeze classifier layer when training to prevent overfitting.

Result

We trained and evaluated on Google Colab T4 GPU with 4 epochs. Here are the results on the test dataset we collected.

metric dev test
f1 83.4 83.9
precision 82.2 82.6
recall 85.6 86.0

Demo

We have included a demo for you to use the model. Here is the link.

Downloads last month
50
Safetensors
Model size
109M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for almo762/biobert-dependency-parsing-v1.7

Finetuned
(18)
this model

Space using almo762/biobert-dependency-parsing-v1.7 1