mdeberta-v3-base-subjectivity-multilingual-no-arabic

This model is a fine-tuned version of microsoft/mdeberta-v3-base for subjectivity detection in news articles. It was presented in the paper AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles.

GitHub Repository: For the official code and more details, please refer to the GitHub repository.

It achieves the following results on the evaluation set:

Loss: 0.7196
Macro F1: 0.8071
Macro P: 0.8037
Macro R: 0.8123
Subj F1: 0.7658
Subj P: 0.7367
Subj R: 0.7973
Accuracy: 0.8159

Model description

This model is a fine-tuned version of microsoft/mdeberta-v3-base for Subjectivity Detection in News Articles. It classifies sentences as subjective or objective across monolingual, multilingual, and zero-shot settings. The core innovation lies in enhancing transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations. This sentiment-augmented architecture, applied here with mDeBERTaV3-base, aims to improve upon standard fine-tuning, particularly boosting subjective F1 score. Decision threshold calibration was also employed to address class imbalance.

Intended uses & limitations

This model is intended to identify whether a sentence is subjective (e.g., opinion-laden) or objective, making it a valuable tool for combating misinformation, improving fact-checking pipelines, and supporting journalists in content analysis.

Limitations:

This specific model (multilingual-no-arabic) was fine-tuned on the multilingual dataset excluding Arabic data.
While designed for multilingual and zero-shot transfer, performance can vary significantly across languages and specific domains.
The original submission process had a mistake where a custom train/dev mix was inadvertently used, leading to skewed class distribution and under-calibrated decision thresholds for the official multilingual Macro F1 score (0.24). A re-evaluation with the correct data split yielded a Macro F1 = 0.68, which would have placed the model 9th overall in the challenge.

Training and evaluation data

Training and development datasets were provided for Arabic, German, English, Italian, and Bulgarian as part of the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles. This specific model was trained on the multilingual dataset, excluding Arabic data. Final evaluation included additional unseen languages (e.g., Greek, Romanian, Polish, Ukrainian) to assess generalization capabilities.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 6

Training results

Training Loss	Epoch	Step	Validation Loss	Macro F1	Macro P	Macro R	Subj F1	Subj P	Subj R	Accuracy
No log	1.0	249	0.5109	0.7809	0.7854	0.7775	0.7219	0.7467	0.6986	0.7968
No log	2.0	498	0.4569	0.7745	0.7806	0.7984	0.7499	0.6506	0.8849	0.7771
0.5055	3.0	747	0.4929	0.8041	0.8002	0.8122	0.7655	0.7226	0.8137	0.8118
0.5055	4.0	996	0.5909	0.8105	0.8065	0.8212	0.7757	0.7217	0.8384	0.8170
0.2749	5.0	1245	0.7195	0.7996	0.8038	0.7963	0.7461	0.7689	0.7247	0.8139
0.2749	6.0	1494	0.7196	0.8071	0.8037	0.8123	0.7658	0.7367	0.7973	0.8159

Framework versions

Transformers 4.47.0
Pytorch 2.5.1+cu121
Datasets 3.3.1
Tokenizers 0.21.0

How to use

You can use the model with the pipeline API from the transformers library for text classification:

from transformers import pipeline

# Load the text classification pipeline
classifier = pipeline(
    "text-classification",
    model="MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic",
    tokenizer="microsoft/mdeberta-v3-base",
)

# Example usage:
# A subjective sentence
result_subj = classifier("This is a truly amazing and groundbreaking discovery!")
print(f"Sentence: 'This is a truly amazing and groundbreaking discovery!' -> {result_subj}")

# An objective sentence
result_obj = classifier("The new policy will be implemented next quarter.")
print(f"Sentence: 'The new policy will be implemented next quarter.' -> {result_obj}")

Citation

If you find our work helpful or inspiring, please feel free to cite it:

@misc{fasulo2025aiwizardscheckthat2025,
      title={AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles}, 
      author={Matteo Fasulo and Luca Babboni and Luca Tedeschini},
      year={2025},
      eprint={2507.11764},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.11764}, 
}

You can find the official paper on Hugging Face Papers: AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles.

License

This work is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

MatteoFasulo
/

mdeberta-v3-base-subjectivity-multilingual-no-arabic

mdeberta-v3-base-subjectivity-multilingual-no-arabic

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

How to use

Citation

License

Model tree for MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic

Dataset used to train MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic

Space using MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic 1

Collection including MatteoFasulo/mdeberta-v3-base-subjectivity-multilingual-no-arabic

CLEF 2025 - CheckThat! Lab - Task 1 Subjectivity

Evaluation results