metadata

tags:
  - text2text-generation
  - definition-modeling
metrics:
  - rouge
model-index:
  - name: mt0-definition-en-xl
    results: []
language:
  - en
widget:
  - text: He ate a sweet apple. What is the definition of apple?
    example_title: Definition generation
  - text: >-
      The paper contains a number of original ideas about color perception. What
      is the definition of original?
    example_title: Definition generation
license: cc-by-sa-4.0
datasets:
  - marksverdhei/wordnet-definitions-en-2021

mT0-Definition-En XL

This model is a version of mT0 XL finetuned on a dataset of English definitions and usage examples.

It generates definitions of English words in context. Its input is the usage example and the instruction question "What is the definiton of TARGET_WORD?"

Model description

See details in the paper Enriching Word Usage Graphs with Cluster Definitions (LREC-COLING'2024) by Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev and Dominik Schlechtweg.

Intended uses & limitations

The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions. Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.

Training and evaluation data

Three datasets were used to fine-tune the model:

WordNet (Ishiwatari et al., NAACL 2019), also available on HF
Oxford dictionary or CHA (Gadetsky et al., ACL 2018)
English subset of CodWoE (Mickus et al., SemEval 2022)

Training results

mT0-Definition-En XL achieves the following results on concatenated validations sets from WordNet and Oxford dictionary:

Loss: 1.7210
Rouge1: 41.5067
Rouge2: 23.7149
Rougel: 39.138
Rougelsum: 39.1647
Gen Len: 15.1578

Training procedure

mT0-Definition-En XL was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 128
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 20.0

Framework versions

Transformers 4.30.2
Pytorch 1.13.1+rocm5.2
Datasets 2.12.0
Tokenizers 0.12.1

ltg
/

mt0-definition-en-xl