--- base_model: EuroBERT/EuroBERT-610m language: - en license: apache-2.0 tags: - text - token-classification - named-entity-recognition - encoder-only - euro-bert - fine-tuned - domain-specific metrics: - seqeval model-index: - name: EuroBERT-610m-group-mention-detector-uk-manifestos results: - task: type: token-classification name: Token classification dataset: type: custom name: custom human-labeled sequence annotation dataset (see model card details) metrics: - type: seqeval name: social group (seqeval) value: 0.637704918032787 - type: seqeval name: political group (seqeval) value: 0.9045226130653266 - type: seqeval name: political institution (seqeval) value: 0.6445264452644527 - type: seqeval name: organization, public institution, or collective actor (seqeval) value: 0.5463258785942493 - type: seqeval name: implicit social group reference (seqeval) value: 0.6131805157593122 --- # EuroBERT-610m-group-mention-detector-uk-manifestos [EuroBERT/EuroBERT-610m](https://huggingface.co/EuroBERT/EuroBERT-610m) model finetuned for social group mention detectin in political texts ## Model Details ### Model Description Token classification model for (social) group mention detection based on [Licht & Sczepanski (2025)](https://doi.org/10.31219/osf.io/ufb96) This token classification has been finetuned on human sequence annotations of sentences of British parties' election manifestos for the following entity types: - social group - implicit social group reference - political group - political institution - organization, public institution, or collective actor Please refer to [Licht & Sczepanski (2025)](https://doi.org/10.31219/osf.io/ufb96) for details. - **Developed by:** Hauke Licht - **Model type:** eurobert - **Language(s) (NLP):** ['en'] - **License:** apache-2.0 - **Finetuned from model:** EuroBERT/EuroBERT-610m - **Funded by:** *Center for Comparative and International Studies* of the ETH Zurich and the University of Zurich and the *Deutsche Forschungsgemeinschaft* (DFG, German Research Foundation) under Germany's Excellence Strategy – EXC 2126/1 – 390838866 ### Model Sources - **Repository:** https://github.com/haukelicht/group_mention_detection/release/ - **Paper:** https://doi.org/10.31219/osf.io/ufb96 - **Demo:** [More Information Needed] ## Uses ### Bias, Risks, and Limitations - Evaluation of the classifier in held-out data shows that it makes mistakes (see section *Results*). - The model has been finetuned only on human-annotated labeled sentences sampled from British parties party manifestos. Applying the classifier in other domains can lead to higher error rates than those reported in section *Results* below. - The data used to finetune the model come from human annotators. Human annotators can be biased and factors like gender and social background can impact their annotations judgments. This may lead to bias in the detection of specific social groups. #### Recommendations - Users who want to apply the model outside its training data domain (British parties' election programs) should evaluate its performance in the target data. - Users who want to apply the model outside its training data domain (British parties' election programs) should contuninue to finetune this model on labeled data. ### How to Get Started with the Model Use the code below to get started with the model. ```pyhton from transformers import pipeline model_id = "haukelicht/roberta-base-group-mention-detector-uk-manifestos" classifier = pipeline(task="ner", model=model_id, aggregation_strategy="simple") text = "Our party fights for the deprived and the vulnerable in our country." annotations = classifier(text) print(annotations) # get annotations' character start and end indexes locations = [(anno['start'], anno['end']) for anno in annotations] locations # index the source text using first annotation as an example loc = locations[0] text[slice(*loc)] ``` ## Training Details ### Training Data The train, dev, and test splits used for model finetuning and evaluation are available on Github: https://github.com/haukelicht/group_mention_detection/release/splits ### Training Procedure #### Training Hyperparameters - epochs: 5 - learning rate: 1e-05 - batch size: 32 - weight decay: 0.01 - warmup ratio: 0.1 ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The train, dev, and test splits used for model finetuning and evaluation are available on Github: https://github.com/haukelicht/group_mention_detection/release/splits #### Metrics - seq-eval F1: strict seqeuence labeling evaluation metric per CoNLL-2000 shared task based on https://github.com/chakki-works/seqeval - "soft" seq-eval F1: a more lenient seqeuence labeling evaluation metric that reports span level average performance suzmmarized across examples per https://github.com/haukelicht/soft-seqeval - sentence-level F1: binary measure of detection performance considering a sentence a positive example/prediction if it contains at least one enttiy to of the given type ### Results | type | seq-eval F1 | soft seq-eval F1 | sentence level F1 | |-------------------------------------------------------|---------------|---------------------|----------------------| | social group | 0.638 | 0.739 | 0.928 | | political group | 0.905 | 0.920 | 0.990 | | political institution | 0.645 | 0.698 | 0.954 | | organization, public institution, or collective actor | 0.546 | 0.552 | 0.928 | | implicit social group reference | 0.613 | 0.537 | 0.943 | ## Citation **BibTeX:** [More Information Needed] **APA:** Licht, H., & Sczepanski, R. (2025). Detecting Group Mentions in Political Rhetoric: A Supervised Learning Approach. forthcoming in *British Journal of Political Science*. Preprint available at [OSF](https://doi.org/10.31219/osf.io/ufb96) ## More Information https://github.com/haukelicht/group_mention_detection/release ## Model Card Contact hauke.licht@uibk.ac.at