---
license: cc-by-4.0
language:
- multilingual
tags:
- zero-shot-classification
- text-classification
- pytorch
metrics:
- recall
- precision
- f1-score
extra_gated_prompt: >-
  Our models are intended for academic use only. If you are not affiliated with
  an academic institution, please provide a rationale for using our models.
  Please allow us a few business days to manually review subscriptions.
extra_gated_fields:
  Name: text
  Country: country
  Institution: text
  Institution Email: text
  Please specify your academic use case: text
---

# xlm-roberta-large-pooled-cap-media-v2
## Model description
An `xlm-roberta-large` model finetuned on multilingual (english, german, hungarian, spanish, slovakian) training data labelled with
[major topic codes](https://www.comparativeagendas.net/pages/master-codebook) from the [Comparative Agendas Project](https://www.comparativeagendas.net/). 
Furthermore we used 7 additional media codes, following [Boydstun (2013)](https://www.amber-boydstun.com/uploads/1/0/6/5/106535199/nyt_front_page_policy_agendas_codebook.pdf):
* State and Local Government Administration (24)
* Weather and Natural Disaster (26)
* Fires(27)
* Sports and Recreation (29)
* Death Notices (30)
* Churches and Religion (31)
* Other, Miscellaneous and Human Interest (99)

## How to use the model

```python
from transformers import AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large")
pipe = pipeline(
    model="poltextlab/xlm-roberta-large-pooled-cap-media1-v2",
    task="text-classification",
    tokenizer=tokenizer,
    use_fast=False,
    token="<your_hf_read_only_token>"
)

text = "We will place an immediate 6-month halt on the finance driven closure of beds and wards, and set up an independent audit of needs and facilities."
pipe(text)
```

### Gated access
Due to the gated access, you must pass the `token` parameter when loading the model. In earlier versions of the Transformers package, you may need to use the `use_auth_token` parameter instead.

## Overall Performance:

* **Accuracy:** 74%
* **Macro Avg:** Precision: 0.76, Recall: 0.74, F1-score: 0.73
* **Weighted Avg:** Precision: 0.76, Recall: 0.74, F1-score: 0.73

## Per-Class Metrics:

| Unnamed: 0                                    |   precision |   recall |   f1-score |     support |
|:----------------------------------------------|------------:|---------:|-----------:|------------:|
| 1: Macroeconomics                             |    0.773585 | 0.82     |   0.796117 |   50        |
| 2: Civil Rights                               |    0.714286 | 0.6      |   0.652174 |   50        |
| 3: Health                                     |    0.803922 | 0.82     |   0.811881 |   50        |
| 4: Agriculture                                |    0.857143 | 0.84     |   0.848485 |   50        |
| 5: Labor                                      |    0.666667 | 0.68     |   0.673267 |   50        |
| 6: Education                                  |    0.86     | 0.86     |   0.86     |   50        |
| 7: Environment                                |    0.829787 | 0.78     |   0.804124 |   50        |
| 8: Energy                                     |    0.851852 | 0.92     |   0.884615 |   50        |
| 9: Immigration                                |    0.888889 | 0.8      |   0.842105 |   50        |
| 10: Transportation                            |    0.661765 | 0.9      |   0.762712 |   50        |
| 12: Law and Crime                             |    0.679245 | 0.72     |   0.699029 |   50        |
| 13: Social Welfare                            |    0.842105 | 0.64     |   0.727273 |   50        |
| 14: Housing                                   |    0.666667 | 0.8      |   0.727273 |   50        |
| 15: Banking, Finance, and Domestic Commerce   |    0.714286 | 0.6      |   0.652174 |   50        |
| 16: Defense                                   |    0.596154 | 0.62     |   0.607843 |   50        |
| 17: Technology                                |    0.709091 | 0.78     |   0.742857 |   50        |
| 18: Foreign Trade                             |    0.88     | 0.88     |   0.88     |   50        |
| 19: International Affairs                     |    0.534483 | 0.62     |   0.574074 |   50        |
| 20: Government Operations                     |    0.790698 | 0.68     |   0.731183 |   50        |
| 21: Public Lands                              |    0.808511 | 0.76     |   0.783505 |   50        |
| 23: Culture                                   |    0.678571 | 0.76     |   0.716981 |   50        |
| 24: State and Local Government Administration |    0.587302 | 0.74     |   0.654867 |   50        |
| 26: Weather and Natural Disasters             |    0.913043 | 0.84     |   0.875    |   50        |
| 27: Fires                                     |    0.942857 | 0.66     |   0.776471 |   50        |
| 29: Sports and Recreation                     |    0.843137 | 0.86     |   0.851485 |   50        |
| 30: Death Notices                             |    0.956522 | 0.88     |   0.916667 |   50        |
| 31: Churches and Religion                     |    0.782609 | 0.72     |   0.75     |   50        |
| 99: Other, Miscellaneous, and Human Interest  |    0.378947 | 0.72     |   0.496552 |   50        |
| 998: No Policy and No Media Content           |    0.75     | 0.06     |   0.111111 |   50        |
| accuracy                                      |    0.736552 | 0.736552 |   0.736552 |    0.736552 |
| macro avg                                     |    0.757315 | 0.736552 |   0.731373 | 1450        |
| weighted avg                                  |    0.757315 | 0.736552 |   0.731373 | 1450        |

## Inference platform
This model is used by the [CAP Babel Machine](https://babel.poltextlab.com), an open-source and free natural language processing tool, designed to simplify and speed up projects for comparative research.  

## Cooperation
Model performance can be significantly improved by extending our training sets. We appreciate every submission of CAP-coded corpora (of any domain and language) at poltextlab{at}poltextlab{dot}com or by using the [CAP Babel Machine](https://babel.poltextlab.com).

## Debugging and issues
This architecture uses the `sentencepiece` tokenizer. In order to run the model before `transformers==4.27` you need to install it manually.

If you encounter a `RuntimeError` when loading the model using the `from_pretrained()` method, adding `ignore_mismatched_sizes=True` should solve the issue.