YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

A Universal Dependency parser built on top of a Transformer language model

Score on pre-tokenized test data:

Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     99.70 |     99.77 |     99.73 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     99.62 |     99.61 |     99.61 |
UPOS       |     96.99 |     96.97 |     96.98 |     97.36
XPOS       |     93.65 |     93.64 |     93.65 |     94.01
UFeats     |     91.31 |     91.29 |     91.30 |     91.65
AllTags    |     86.86 |     86.85 |     86.86 |     87.19
Lemmas     |     95.83 |     95.81 |     95.82 |     96.19
UAS        |     89.01 |     89.00 |     89.00 |     89.35
LAS        |     85.72 |     85.70 |     85.71 |     86.04
CLAS       |     81.39 |     80.91 |     81.15 |     81.34
MLAS       |     69.21 |     68.81 |     69.01 |     69.17
BLEX       |     77.44 |     76.99 |     77.22 |     77.40

Score on untokenized test data:

Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |     99.50 |     99.66 |     99.58 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |     99.42 |     99.50 |     99.46 |
UPOS       |     96.80 |     96.88 |     96.84 |     97.37
XPOS       |     93.48 |     93.56 |     93.52 |     94.03
UFeats     |     91.13 |     91.20 |     91.16 |     91.66
AllTags    |     86.71 |     86.78 |     86.75 |     87.22
Lemmas     |     95.66 |     95.74 |     95.70 |     96.22
UAS        |     88.76 |     88.83 |     88.80 |     89.28
LAS        |     85.49 |     85.55 |     85.52 |     85.99
CLAS       |     81.19 |     80.73 |     80.96 |     81.31
MLAS       |     69.06 |     68.67 |     68.87 |     69.16
BLEX       |     77.28 |     76.84 |     77.06 |     77.39

To use the model, you need to setup COMBO, which makes it possible to use word embeddings from a pre-trained transformer model (electra-base-igc-is).

git submodule update --init --recursive
pip install -U pip setuptools wheel
pip install --index-url https://pypi.clarin-pl.eu/simple combo==1.0.5
  • For Python 3.9, you might need to install cython:
pip install -U pip cython
  • Then you can run the model as it is done in parse_file.py

For more instructions, see here: https://gitlab.clarin-pl.eu/syntactic-tools/combo

The Tokenizer directory is a clone of Miðeind's tokenizer.

The directory transformer_models/ contains the pretrained model electra-base-igc-is, which supplies the parser with contextual embeddings and attention, trained by Jón Friðrik Daðason.

License

https://opensource.org/licenses/Apache-2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.