Welcome to ParlBERT-Topic-German!
🏷 Model description
This model was trained on ~10k manually annotated interpellations (📚 Breunig/ Schnatterer 2019) with topics from the Comparative Agendas Project to classify text into one of twenty labels (annotation codebook).
Note: "Interpellation is a formal request of a parliament to the respective government."(Wikipedia)
🗃 Dataset
party | speeches | tokens |
---|---|---|
CDU/CSU | 7,635 | 4,862,654 |
SPD | 5,321 | 3,158,315 |
AfD | 3,465 | 1,844,707 |
FDP | 3,067 | 1,593,108 |
The Greens | 2,866 | 1,522,305 |
The Left | 2,671 | 1,394,089 |
cross-bencher | 200 | 86,170 |
🏃🏼♂️Model training
ParlBERT-Topic-German was fine-tuned on a domain adapted model (GermanBERT fine-tuned on DeuParl) for topic modeling with an interpellations dataset (📚 Breunig/ Schnatterer 2019) from the Comparative Agendas Project.
🤖 Use
from transformers import pipeline
pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topic-german", return_all_scores=False)
text = "Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben."
pipeline_classification_topics(text) # Macroeconomics
📊 Evaluation
The model was evaluated on an evaluation set (20%):
Label | F1 | support |
---|---|---|
International | 80.0 | 1,126 |
Defense | 85.0 | 1,099 |
Government | 71.3 | 989 |
Civil Rights | 76.5 | 978 |
Environment | 76.6 | 845 |
Transportation | 86.0 | 800 |
Law & Crime | 67.1 | 492 |
Energy | 78.6 | 424 |
Health | 78.2 | 418 |
Domestic Com. | 64.4 | 382 |
Immigration | 81.0 | 376 |
Labor | 69.1 | 344 |
Macroeconom. | 62.8 | 339 |
Agriculture | 76.3 | 292 |
Social Welfare | 49.2 | 253 |
Technology | 63.0 | 252 |
Education | 71.6 | 183 |
Housing | 79.6 | 178 |
Foreign Trade | 61.5 | 139 |
Culture | 54.6 | 69 |
Public Lands | 45.4 | 55 |
⚠️ Limitations
Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.
👥 Cite
@article{klamm2022frameast,
title={FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics},
author={Klamm, Christopher and Rehbein, Ines and Ponzetto, Simone},
journal={ParlaCLARIN III at LREC2022},
year={2022}
}
🐦 Twitter: @chklamm
- Downloads last month
- 155