roberta-large-group-mention-detector-uk-manifestos

roberta-large model finetuned for social group mention detectin in political texts

Model Details

Model Description

Token classification model for (social) group mention detection based on Licht & Sczepanski (2025)

This token classification has been finetuned on human sequence annotations of sentences of British parties' election manifestos for the following entity types:

  • social group
  • implicit social group reference
  • political group
  • political institution
  • organization, public institution, or collective actor

Please refer to Licht & Sczepanski (2025) for details.

  • Developed by: Hauke Licht
  • Model type: roberta
  • Language(s) (NLP): ['en']
  • License: apache-2.0
  • Finetuned from model: roberta-large
  • Funded by: Center for Comparative and International Studies of the ETH Zurich and the University of Zurich and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy โ€“ EXC 2126/1 โ€“ 390838866

Model Sources

Uses

Bias, Risks, and Limitations

  • Evaluation of the classifier in held-out data shows that it makes mistakes (see section Results).
  • The model has been finetuned only on human-annotated labeled sentences sampled from British parties party manifestos. Applying the classifier in other domains can lead to higher error rates than those reported in section Results below.
  • The data used to finetune the model come from human annotators. Human annotators can be biased and factors like gender and social background can impact their annotations judgments. This may lead to bias in the detection of specific social groups.

Recommendations

  • Users who want to apply the model outside its training data domain (British parties' election programs) should evaluate its performance in the target data.
  • Users who want to apply the model outside its training data domain (British parties' election programs) should contuninue to finetune this model on labeled data.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import pipeline

model_id = "haukelicht/roberta-base-group-mention-detector-uk-manifestos"

classifier = pipeline(task="ner", model=model_id, aggregation_strategy="simple")

text = "Our party fights for the deprived and the vulnerable in our country."
annotations = classifier(text)
print(annotations)

# get annotations' character start and end indexes
locations = [(anno['start'], anno['end']) for anno in annotations]
locations

# index the source text using first annotation as an example
loc = locations[0]
text[slice(*loc)]

Training Details

Training Data

The train, dev, and test splits used for model finetuning and evaluation are available on Github: https://github.com/haukelicht/group_mention_detection/release/splits

Training Procedure

Training Hyperparameters

  • epochs: 6
  • learning rate: 1e-05
  • batch size: 32
  • weight decay: 0.01
  • warmup ratio: 0.1

Evaluation

Testing Data, Factors & Metrics

Testing Data

The train, dev, and test splits used for model finetuning and evaluation are available on Github: https://github.com/haukelicht/group_mention_detection/release/splits

Metrics

  • seq-eval F1: strict seqeuence labeling evaluation metric per CoNLL-2000 shared task based on https://github.com/chakki-works/seqeval
  • "soft" seq-eval F1: a more lenient seqeuence labeling evaluation metric that reports span level average performance suzmmarized across examples per https://github.com/haukelicht/soft-seqeval
  • sentence-level F1: binary measure of detection performance considering a sentence a positive example/prediction if it contains at least one enttiy to of the given type

Results

type seq-eval F1 soft seq-eval F1 sentence level F1
social group 0.739 0.789 0.941
political group 0.914 0.917 0.987
political institution 0.700 0.740 0.958
organization, public institution, or collective actor 0.613 0.625 0.935
implicit social group reference 0.731 0.634 0.956

Citation

BibTeX:

[More Information Needed]

APA:

Licht, H., & Sczepanski, R. (2025). Detecting Group Mentions in Political Rhetoric: A Supervised Learning Approach. forthcoming in British Journal of Political Science. Preprint available at OSF

More Information

https://github.com/haukelicht/group_mention_detection/release

Model Card Contact

[email protected]

Downloads last month
10
Safetensors
Model size
354M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for haukelicht/roberta-large-group-mention-detector-uk-manifestos

Finetuned
(355)
this model

Collection including haukelicht/roberta-large-group-mention-detector-uk-manifestos

Evaluation results

  • social group (seqeval) on custom human-labeled sequence annotation dataset (see model card details)
    self-reported
    0.739
  • political group (seqeval) on custom human-labeled sequence annotation dataset (see model card details)
    self-reported
    0.914
  • political institution (seqeval) on custom human-labeled sequence annotation dataset (see model card details)
    self-reported
    0.700
  • organization, public institution, or collective actor (seqeval) on custom human-labeled sequence annotation dataset (see model card details)
    self-reported
    0.613
  • implicit social group reference (seqeval) on custom human-labeled sequence annotation dataset (see model card details)
    self-reported
    0.731