roberta-large-group-mention-detector-uk-manifestos

roberta-large model finetuned for social group mention detectin in political texts

Model Details

Model Description

Token classification model for (social) group mention detection based on Licht & Sczepanski (2025)

This token classification has been finetuned on human sequence annotations of sentences of British parties' election manifestos for the following entity types:

social group
implicit social group reference
political group
political institution
organization, public institution, or collective actor

Please refer to Licht & Sczepanski (2025) for details.

Developed by: Hauke Licht
Model type: roberta
Language(s) (NLP): ['en']
License: apache-2.0
Finetuned from model: roberta-large
Funded by: Center for Comparative and International Studies of the ETH Zurich and the University of Zurich and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy – EXC 2126/1 – 390838866

Model Sources

Repository: https://github.com/haukelicht/group_mention_detection/release/
Paper: https://doi.org/10.31219/osf.io/ufb96
Demo: [More Information Needed]

Uses

Bias, Risks, and Limitations

Evaluation of the classifier in held-out data shows that it makes mistakes (see section Results).
The model has been finetuned only on human-annotated labeled sentences sampled from British parties party manifestos. Applying the classifier in other domains can lead to higher error rates than those reported in section Results below.
The data used to finetune the model come from human annotators. Human annotators can be biased and factors like gender and social background can impact their annotations judgments. This may lead to bias in the detection of specific social groups.

Recommendations

Users who want to apply the model outside its training data domain (British parties' election programs) should evaluate its performance in the target data.
Users who want to apply the model outside its training data domain (British parties' election programs) should contuninue to finetune this model on labeled data.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import pipeline

model_id = "haukelicht/roberta-base-group-mention-detector-uk-manifestos"

classifier = pipeline(task="ner", model=model_id, aggregation_strategy="simple")

text = "Our party fights for the deprived and the vulnerable in our country."
annotations = classifier(text)
print(annotations)

# get annotations' character start and end indexes
locations = [(anno['start'], anno['end']) for anno in annotations]
locations

# index the source text using first annotation as an example
loc = locations[0]
text[slice(*loc)]

Training Details

Training Data

The train, dev, and test splits used for model finetuning and evaluation are available on Github: https://github.com/haukelicht/group_mention_detection/release/splits

Training Procedure

Training Hyperparameters

epochs: 6
learning rate: 1e-05
batch size: 32
weight decay: 0.01
warmup ratio: 0.1

Evaluation

Testing Data, Factors & Metrics

Testing Data

The train, dev, and test splits used for model finetuning and evaluation are available on Github: https://github.com/haukelicht/group_mention_detection/release/splits

Metrics

seq-eval F1: strict seqeuence labeling evaluation metric per CoNLL-2000 shared task based on https://github.com/chakki-works/seqeval
"soft" seq-eval F1: a more lenient seqeuence labeling evaluation metric that reports span level average performance suzmmarized across examples per https://github.com/haukelicht/soft-seqeval
sentence-level F1: binary measure of detection performance considering a sentence a positive example/prediction if it contains at least one enttiy to of the given type

Results

type	seq-eval F1	soft seq-eval F1	sentence level F1
social group	0.739	0.789	0.941
political group	0.914	0.917	0.987
political institution	0.700	0.740	0.958
organization, public institution, or collective actor	0.613	0.625	0.935
implicit social group reference	0.731	0.634	0.956

Citation

BibTeX:

[More Information Needed]

APA:

Licht, H., & Sczepanski, R. (2025). Detecting Group Mentions in Political Rhetoric: A Supervised Learning Approach. forthcoming in British Journal of Political Science. Preprint available at OSF

More Information

https://github.com/haukelicht/group_mention_detection/release

Model Card Contact

[email protected]

haukelicht
/

roberta-large-group-mention-detector-uk-manifestos