TiRoBERTa Fine-tuned for Multi-task Abusiveness, Sentiment, and Topic Classification

This model is a fine-tuned version of TiRoBERTa on the TiALD dataset.

Tigrinya Abusive Language Detection (TiALD) Dataset is a large-scale, multi-task benchmark dataset for abusive language detection in the Tigrinya language. It consists of 13,717 YouTube comments annotated for abusiveness, sentiment, and topic tasks. The dataset includes comments written in both the Ge’ez script and prevalent non-standard Latin transliterations to mirror real-world usage.

⚠️ The dataset contains explicit, obscene, and potentially hateful language. It should be used for research purposes only. ⚠️

This work accompanies the paper "A Multi-Task Benchmark for Abusive Language Detection in Low-Resource Settings".

Model Usage

from transformers import pipeline

tiald_multitask = pipeline("text-classification", model="fgaim/tiroberta-tiald-all-tasks", top_k=11)
tiald_multitask("<text-to-classify>")

Performance Metrics

This model achieves the following results on the TiALD test set:

"abusiveness_metrics": {
    "accuracy": 0.8611111111111112,
    "macro_f1": 0.8611109396431353,
    "macro_recall": 0.8611111111111112,
    "macro_precision": 0.8611128943846637,
    "weighted_f1": 0.8611109396431355,
    "weighted_recall": 0.8611111111111112,
    "weighted_precision": 0.8611128943846637
},
"topic_metrics": {
    "accuracy": 0.6155555555555555,
    "macro_f1": 0.5491185274678864,
    "macro_recall": 0.5143416011263588,
    "macro_precision": 0.7341640739780486,
    "weighted_f1": 0.5944096153417657,
    "weighted_recall": 0.6155555555555555,
    "weighted_precision": 0.6870800624645906
},
"sentiment_metrics": {
    "accuracy": 0.6533333333333333,
    "macro_f1": 0.5340845253007789,
    "macro_recall": 0.5410170159158625,
    "macro_precision": 0.534652401599494,
    "weighted_f1": 0.6620101614004723,
    "weighted_recall": 0.6533333333333333,
    "weighted_precision": 0.6750245466592532
}

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 8
optimizer: Adam (betas=0.9, 0.999, epsilon=1e-08)
lr_scheduler_type: linear
num_epochs: 7.0
seed: 42

Intended Usage

The TiALD dataset and models designed to support:

Research in abusive language detection in low-resource languages
Context-aware abuse, sentiment, and topic modeling
Multi-task and transfer learning with digraphic scripts
Evaluation of multilingual and fine-tuned language models

Researchers and developers should avoid using this dataset for direct moderation or enforcement tasks without human oversight.

Ethical Considerations

Sensitive content: Contains toxic and offensive language. Use for research purposes only.
Cultural sensitivity: Abuse is context-dependent; annotations were made by native speakers to account for cultural nuance.
Bias mitigation: Data sampling and annotation were carefully designed to minimize reinforcement of stereotypes.
Privacy: All the source content for the dataset is publicly available on YouTube.
Respect for expression: The dataset should not be used for automated censorship without human review.

This research received IRB approval (Ref: KH2022-133) and followed ethical data collection and annotation practices, including informed consent of annotators.

Citation

If you use this model or the TiALD dataset in your work, please cite:

@misc{gaim-etal-2025-tiald-benchmark,
  title         = {A Multi-Task Benchmark for Abusive Language Detection in Low-Resource Settings},
  author        = {Fitsum Gaim and Hoyun Song and Huije Lee and Changgeon Ko and Eui Jun Hwang and Jong C. Park},
  year          = {2025},
  eprint        = {2505.12116},
  archiveprefix = {arXiv},
  primaryclass  = {cs.CL},
  url           = {https://arxiv.org/abs/2505.12116}
}

License

This dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

fgaim
/

tiroberta-tiald-multi-task