mdeberta-v3-base-subjectivity-multilingual-no-arabic

This model is a fine-tuned version of microsoft/mdeberta-v3-base for subjectivity detection in news articles. It was presented in the paper AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles.

GitHub Repository: For the official code and more details, please refer to the GitHub repository.

It achieves the following results on the evaluation set:

  • Loss: 0.7196
  • Macro F1: 0.8071
  • Macro P: 0.8037
  • Macro R: 0.8123
  • Subj F1: 0.7658
  • Subj P: 0.7367
  • Subj R: 0.7973
  • Accuracy: 0.8159

Model description

This model is a fine-tuned version of microsoft/mdeberta-v3-base for Subjectivity Detection in News Articles. It classifies sentences as subjective or objective across monolingual, multilingual, and zero-shot settings. The core innovation lies in enhancing transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This sentiment-augmented architecture, applied here with mDeBERTaV3-base, aims to improve upon standard fine-tuning, particularly boosting subjective F1 score. Decision threshold calibration was also employed to address class imbalance.

Intended uses & limitations

This model is intended to identify whether a sentence is subjective (e.g., opinion-laden) or objective, making it a valuable tool for combating misinformation, improving fact-checking pipelines, and supporting journalists in content analysis.

Limitations:

  • This specific model (multilingual-no-arabic) was fine-tuned on the multilingual dataset excluding Arabic data.
  • While designed for multilingual and zero-shot transfer, performance can vary significantly across languages and specific domains.
  • The original submission process had a mistake where a custom train/dev mix was inadvertently used, leading to skewed class distribution and under-calibrated decision thresholds for the official multilingual Macro F1 score (0.24). A re-evaluation with the correct data split yielded a Macro F1 = 0.68, which would have placed the model 9th overall in the challenge.

Training and evaluation data

Training and development datasets were provided for Arabic, German, English, Italian, and Bulgarian as part of the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. This specific model was trained on the multilingual dataset, excluding Arabic data. Final evaluation included additional unseen languages (e.g., Greek, Romanian, Polish, Ukrainian) to assess generalization capabilities.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss Macro F1 Macro P Macro R Subj F1 Subj P Subj R Accuracy
No log 1.0 249 0.5109 0.7809 0.7854 0.7775 0.7219 0.7467 0.6986 0.7968
No log 2.0 498 0.4569 0.7745 0.7806 0.7984 0.7499 0.6506 0.8849 0.7771
0.5055 3.0 747 0.4929 0.8041 0.8002 0.8122 0.7655 0.7226 0.8137 0.8118
0.5055 4.0 996 0.5909 0.8105 0.8065 0.8212 0.7757 0.7217 0.8384 0.8170
0.2749 5.0 1245 0.7195 0.7996 0.8038 0.7963 0.7461 0.7689 0.7247 0.8139
0.2749 6.0 1494 0.7196 0.8071 0.8037 0.8123 0.7658 0.7367 0.7973 0.8159

Framework versions

  • Transformers 4.47.0
  • Pytorch 2.5.1+cu121
  • Datasets 3.3.1
  • Tokenizers 0.21.0

How to use

You can use the model with the pipeline API from the transformers library for text classification:

from transformers import pipeline

# Load the text classification pipeline
classifier = pipeline(
    "text-classification",
    model="MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic",
    tokenizer="microsoft/mdeberta-v3-base",
)

# Example usage:
# A subjective sentence
result_subj = classifier("This is a truly amazing and groundbreaking discovery!")
print(f"Sentence: 'This is a truly amazing and groundbreaking discovery!' -> {result_subj}")

# An objective sentence
result_obj = classifier("The new policy will be implemented next quarter.")
print(f"Sentence: 'The new policy will be implemented next quarter.' -> {result_obj}")

Citation

If you find our work helpful or inspiring, please feel free to cite it:

@misc{fasulo2025aiwizardscheckthat2025,
      title={AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles}, 
      author={Matteo Fasulo and Luca Babboni and Luca Tedeschini},
      year={2025},
      eprint={2507.11764},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.11764}, 
}

You can find the official paper on Hugging Face Papers: AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles.

License

This work is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

Downloads last month
15
Safetensors
Model size
279M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic

Finetuned
(202)
this model

Dataset used to train MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic

Space using MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic 1

Collection including MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic