library_name: transformers
base_model: cardiffnlp/twitter-xlm-roberta-base-sentiment
tags:
  - text-classification
  - multi-label-classification
  - multi-head-classification
  - disaster-response
  - humanitarian-aid
  - social-media
  - twitter
  - generated_from_trainer
model-index:
  - name: xlm-roberta-sentiment-requests
    results:
      - task:
          type: text-classification
        dataset:
          name: community-datasets/disaster_response_messages
          type: community-datasets
          config: default
          split: evaluation
        metrics:
          - name: F1 Micro
            type: f1
            value: 0.724
          - name: F1 Macro
            type: f1
            value: 0.3505
          - name: Subset Accuracy
            type: accuracy
            value: 0.2588
datasets:
  - community-datasets/disaster_response_messages
pipeline_tag: text-classification
language:
  - en
  - multilingual
xlm-roberta-sentiment-requests
This model is a fine-tuned version of cardiffnlp/twitter-xlm-roberta-base-sentiment on the community-datasets/disaster_response_messages dataset. It has been adapted into a powerful multi-head classification model designed to analyze messages from social media during disaster events.
It achieves the following results on the evaluation set:
- Loss: 0.1465
- F1 Micro: 0.7240
- F1 Macro: 0.3505
- Subset Accuracy: 0.2588
Model description
This model uses a shared XLM-RoBERTa base to encode input text. The resulting text representation is then fed into two separate, independent classification layers (heads):
- A Sentiment Head (Frozen from pre-trained model) with 3 outputs for positive,neutral, andnegativeclasses.
- A Multi-Label Head (Newly created and fine-tuned) with 41 outputs, which are decoded to predict the presence or absence of 37 different disaster-related categories.
This dual-head architecture allows for a nuanced understanding of a message, capturing both its emotional content and its specific, actionable information.
Intended uses & limitations
This model is intended for organizations and researchers involved in humanitarian aid and disaster response. Potential applications include:
- Automated Triage: Quickly sorting through thousands of social media messages to identify the most urgent requests for help.
- Situational Awareness: Building a real-time map of needs by aggregating categorized messages.
- Resource Allocation: Directing resources more effectively by understanding the specific types of aid being requested.
Important: Due to its custom architecture, this model cannot be used with the standard pipeline("text-classification") function. Please see the usage code below for the correct implementation.
How to Use
This model requires custom code to handle its two-headed output. The following is a complete, self-contained Python script to run inference. You will need to have transformers, torch, safetensors, and huggingface_hub installed (pip install transformers torch safetensors huggingface_hub).
The script automatically downloads all necessary files, including the model weights and metadata. Simply copy the code blocks below and run the script.
The script is broken into logical blocks:
- Model Architecture: A Python class that defines the model's structure. This blueprint is required to load the saved weights.
- Label Definitions: A "decoder ring" of functions to translate the model's numerical outputs into human-readable labels.
- Setup & Loading: A function that handles all the one-time setup.
- Prediction Function: The core logic that takes text and produces a dictionary of predictions.
- Main Execution: An example of how to run the script.
By copying the codes below from 1 to 5, you will be able to run the entire inference pipeline with all outputs.
- Model Architecture: We define the necessary imports and the model architecture.
import torch
from torch import nn
from transformers import AutoTokenizer, AutoConfig, AutoModel, PreTrainedModel
from huggingface_hub import hf_hub_download
from typing import Dict, Any
from safetensors.torch import load_file
import json
class MultiHeadClassificationModel(PreTrainedModel):
    def __init__(self, config, **kwargs):
        super().__init__(config)
        num_multilabels = kwargs.get("num_multilabels")
        if num_multilabels is None:
            raise ValueError("`num_multilabels` must be provided to initialize the model.")
        self.backbone = AutoModel.from_config(config)
        self.sentiment_classifier = nn.Linear(config.hidden_size, config.num_sentiment_labels)
        self.multilabel_classifier = nn.Linear(config.hidden_size, num_multilabels)
        self.init_weights()
    def forward(self, input_ids=None, attention_mask=None, **kwargs):
        outputs = self.backbone(input_ids, attention_mask=attention_mask, **kwargs)
        cls_token_output = outputs.last_hidden_state[:, 0, :]
        sentiment_logits = self.sentiment_classifier(cls_token_output)
        multilabel_logits = self.multilabel_classifier(cls_token_output)
        return {"sentiment_logits": sentiment_logits, "multilabel_logits": multilabel_logits}
- Label Definitions: We embed the label definitions, which are essential for interpreting the model's output.
def get_all_labels() -> Dict[str, Dict[int, str]]:
    return {
        'sentiment': get_sentiment_labels(), 'genre': get_genre_labels(), 'related': get_related_labels(),
        'request': get_request_labels(), 'offer': get_offer_labels(), 'aid_related': get_aid_related_labels(),
        'medical_help': get_medical_help_labels(), 'medical_products': get_medical_products_labels(),
        'search_and_rescue': get_search_and_rescue_labels(), 'security': get_security_labels(),
        'military': get_military_labels(), 'child_alone': get_child_alone_labels(), 'water': get_water_labels(),
        'food': get_food_labels(), 'shelter': get_shelter_labels(), 'clothing': get_clothing_labels(),
        'money': get_money_labels(), 'missing_people': get_missing_people_labels(),
        'refugees': get_refugees_labels(), 'death': get_death_labels(), 'other_aid': get_other_aid_labels(),
        'infrastructure_related': get_infrastructure_related_labels(), 'transport': get_transport_labels(),
        'buildings': get_buildings_labels(), 'electricity': get_electricity_labels(), 'tools': get_tools_labels(),
        'hospitals': get_hospitals_labels(), 'shops': get_shops_labels(), 'aid_centers': get_aid_centers_labels(),
        'other_infrastructure': get_other_infrastructure_labels(), 'weather_related': get_weather_related_labels(),
        'floods': get_floods_labels(), 'storm': get_storm_labels(), 'fire': get_fire_labels(),
        'earthquake': get_earthquake_labels(), 'cold': get_cold_labels(), 'other_weather': get_other_weather_labels(),
        'direct_report': get_direct_report_labels(),
    }
def get_genre_labels() -> Dict[int, str]: return {0: 'direct', 1: 'news', 2: 'social'}
def get_related_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes', 2: 'maybe'}
def get_request_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_offer_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_aid_related_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_medical_help_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_medical_products_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_search_and_rescue_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_security_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_military_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_child_alone_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_water_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_food_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_shelter_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_clothing_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_money_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_missing_people_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_refugees_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_death_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_other_aid_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_infrastructure_related_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_transport_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_buildings_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_electricity_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_tools_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_hospitals_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_shops_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_aid_centers_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_other_infrastructure_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_weather_related_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_floods_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_storm_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_fire_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_earthquake_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_cold_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_other_weather_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_direct_report_labels() -> Dict[int, str]: return {0: 'no', 1: 'yes'}
def get_sentiment_labels() -> Dict[int, str]: return {0: 'negative', 1: 'neutral', 2: 'positive'}
- Setup & Loading: This setup function downloads and loads all components, including metadata.json, from the Hub.
def load_essentials():
    print("Loading model, tokenizer, and metadata... (This may take a moment on first run)")
    
    hub_repo_id = "spencercdz/xlm-roberta-sentiment-requests"
    subfolder = "final_model"
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using device: {device}")
    # Load the model's output structure from the metadata.json file.
    metadata_path = hf_hub_download(repo_id=hub_repo_id, filename="metadata.json", subfolder=subfolder)
    with open(metadata_path, "r") as f:
        file_metadata = json.load(f)
    # Use the metadata to define the number of output neurons for the classification heads.
    binary_tasks = file_metadata["binary_tasks"]
    multiclass_tasks = file_metadata["multiclass_tasks"]
    multilabel_column_names = file_metadata["multilabel_column_names"]
    num_multilabels = len(multilabel_column_names)
    num_sentiment_labels = len(get_sentiment_labels())
    # Load the standard tokenizer and config.
    tokenizer = AutoTokenizer.from_pretrained(hub_repo_id, subfolder=subfolder)
    config = AutoConfig.from_pretrained(hub_repo_id, subfolder=subfolder)
    
    # Add our custom sentiment label count to the config.
    config.num_sentiment_labels = num_sentiment_labels
    # Manually load the custom model, as it's not a standard transformers architecture.
    # Create a model 'shell' with our custom architecture.
    model_shell = MultiHeadClassificationModel(config=config, num_multilabels=num_multilabels)
    
    # Download and load the trained weights.
    weights_path = hf_hub_download(repo_id=hub_repo_id, filename="model.safetensors", subfolder=subfolder)
    state_dict = load_file(weights_path, device="cpu")
    
    # Apply weights to the shell. `strict=False` is required for loading custom heads.
    model_shell.load_state_dict(state_dict, strict=False)
    
    # Move model to the target device and set to evaluation mode.
    model = model_shell.to(device)
    model.eval()
    # Package all components for use in the predict function.
    metadata_for_prediction = {
        "binary_tasks": binary_tasks,
        "multiclass_tasks": multiclass_tasks,
        "multilabel_column_names": multilabel_column_names,
        "all_labels": get_all_labels(),
        "device": device
    }
    print("Loading complete.")
    return model, tokenizer, metadata_for_prediction
- Prediction Function: The prediction function takes the loaded components and input text to produce a decoded dictionary.
def predict(text: str, model, tokenizer, metadata: Dict) -> Dict[str, Any]:
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512).to(metadata['device'])
    with torch.no_grad():
        outputs = model(**inputs)
    
    sentiment_probs = torch.softmax(outputs['sentiment_logits'], dim=-1).cpu().numpy()
    multilabel_probs = torch.sigmoid(outputs['multilabel_logits']).cpu().numpy()
    results = {}
    sentiment_decoder = metadata['all_labels']['sentiment']
    sentiment_pred_idx = sentiment_probs.argmax()
    results['sentiment'] = {'prediction': sentiment_decoder.get(sentiment_pred_idx, "unknown"), 'confidence': sentiment_probs[0, sentiment_pred_idx].item()}
    
    for task_name in metadata['binary_tasks']:
        idx = metadata['multilabel_column_names'].index(task_name)
        prob = multilabel_probs[0, idx]
        pred = 1 if prob > 0.5 else 0
        results[task_name] = {'prediction': metadata['all_labels'][task_name][pred], 'confidence': (prob if pred == 1 else 1 - prob).item()}
    for task_name, num_classes in metadata['multiclass_tasks'].items():
        start_idx = metadata['multilabel_column_names'].index(f"{task_name}_0")
        task_probs = multilabel_probs[0, start_idx : start_idx + num_classes]
        pred_idx = task_probs.argmax()
        results[task_name] = {'prediction': metadata['all_labels'][task_name].get(pred_idx, "unknown"), 'confidence': task_probs[pred_idx].item()}
    return results
- Main Execution: The main execution block shows how to use the functions and print the raw JSON output.
if __name__ == "__main__":
    model, tokenizer, metadata = load_essentials()
    input_text = "I need food, water, and shelter. Help me! People are dying. We need more items."
    
    print(f"\n--- Predicting for Input ---\n\"{input_text}\"")
    
    predictions = predict(input_text, model, tokenizer, metadata)
    
    # Print the raw dictionary output
    print("\n--- RAW DICTIONARY OUTPUT ---")
    print(json.dumps(predictions, indent=4))
Sample Output
{'sentiment': {'prediction': 'negative', 'confidence': 0.999014139175415}, 'request': {'prediction': 'yes', 'confidence': 0.9999805688858032}, 'offer': {'prediction': 'no', 'confidence': 0.9995545148849487}, 'aid_related': {'prediction': 'yes', 'confidence': 0.9995179176330566}, 'medical_help': {'prediction': 'no', 'confidence': 0.9931818246841431}, 'medical_products': {'prediction': 'no', 'confidence': 0.9975765943527222}, 'search_and_rescue': {'prediction': 'no', 'confidence': 0.9981554746627808}, 'security': {'prediction': 'no', 'confidence': 0.999071478843689}, 'military': {'prediction': 'no', 'confidence': 0.9981452226638794}, 'child_alone': {'prediction': 'no', 'confidence': 0.9998688697814941}, 'water': {'prediction': 'yes', 'confidence': 0.9991873502731323}, 'food': {'prediction': 'yes', 'confidence': 0.9998394250869751}, 'shelter': {'prediction': 'yes', 'confidence': 0.9997198581695557}, 'clothing': {'prediction': 'no', 'confidence': 0.9982467889785767}, 'money': {'prediction': 'no', 'confidence': 0.9985392093658447}, 'missing_people': {'prediction': 'no', 'confidence': 0.998404324054718}, 'refugees': {'prediction': 'no', 'confidence': 0.9981242418289185}, 'death': {'prediction': 'yes', 'confidence': 0.9850122332572937}, 'other_aid': {'prediction': 'no', 'confidence': 0.9654157757759094}, 'infrastructure_related': {'prediction': 'no', 'confidence': 0.984534740447998}, 'transport': {'prediction': 'no', 'confidence': 0.9972304105758667}, 'buildings': {'prediction': 'no', 'confidence': 0.9881182312965393}, 'electricity': {'prediction': 'no', 'confidence': 0.9988776445388794}, 'tools': {'prediction': 'no', 'confidence': 0.9995874166488647}, 'hospitals': {'prediction': 'no', 'confidence': 0.999099850654602}, 'shops': {'prediction': 'no', 'confidence': 0.9996023178100586}, 'aid_centers': {'prediction': 'no', 'confidence': 0.9981774091720581}, 'other_infrastructure': {'prediction': 'no', 'confidence': 0.9968826770782471}, 'weather_related': {'prediction': 'no', 'confidence': 0.9632836580276489}, 'floods': {'prediction': 'no', 'confidence': 0.9960920810699463}, 'storm': {'prediction': 'no', 'confidence': 0.9963870048522949}, 'fire': {'prediction': 'no', 'confidence': 0.9993714094161987}, 'earthquake': {'prediction': 'no', 'confidence': 0.99778151512146}, 'cold': {'prediction': 'no', 'confidence': 0.9991660118103027}, 'other_weather': {'prediction': 'no', 'confidence': 0.9974269866943359}, 'direct_report': {'prediction': 'yes', 'confidence': 0.9763266444206238}, 'genre': {'prediction': 'direct', 'confidence': 0.9912198185920715}, 'related': {'prediction': 'yes', 'confidence': 0.9997092485427856}}
Training and evaluation data
This model was fine-tuned on the community-datasets/disaster_response_messages dataset, which contains over 26,000 messages from real disaster events. Each message is labeled with 37 different categories, such as aid_related and weather_related, as well as the message genre (direct, news, social). The sentiment labels were added programmatically for the purpose of this multi-task training.
The dataset was split into:
- Training set: ~21,000 samples
- Validation set: ~2,600 samples
- Test set: ~2,600 samples
Training procedure
The model was trained using the transformers.Trainer with a custom MultiHeadClassificationModel architecture. The training process optimized a combined loss from both the sentiment and multi-label classification heads. The best model was selected based on the F1 Micro score on the validation set.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 1000 (early stopping patience of 50 epochs)
- mixed_precision_training: Native AMP
Training results
The final results on the evaluation set are based on the best checkpoint at epoch 594. A truncated history of the 25 most important rows are shown below. For the full data, please refer to training_log.csv in the repository.
| Training Loss | Epoch | Step | Validation Loss | F1 Micro | F1 Macro | Subset Accuracy | 
|---|---|---|---|---|---|---|
| 0.4267 | 1.0 | 658 | 0.2727 | 0.4953 | 0.0722 | 0.1053 | 
| 0.2662 | 2.0 | 1316 | 0.2291 | 0.5446 | 0.0906 | 0.1123 | 
| 0.2366 | 3.0 | 1974 | 0.2143 | 0.5682 | 0.1031 | 0.1279 | 
| 0.2234 | 4.0 | 2632 | 0.2058 | 0.5878 | 0.1160 | 0.1333 | 
| 0.2156 | 5.0 | 3290 | 0.1997 | 0.6022 | 0.1255 | 0.1380 | 
| ... | ... | ... | ... | ... | ... | ... | 
| 0.1773 | 25.0 | 16450 | 0.1670 | 0.6714 | 0.2305 | 0.1955 | 
| 0.1694 | 50.0 | 32900 | 0.1592 | 0.6911 | 0.2701 | 0.2223 | 
| 0.1662 | 75.0 | 49350 | 0.1558 | 0.7018 | 0.2960 | 0.2309 | 
| 0.164 | 100.0 | 65800 | 0.1537 | 0.7077 | 0.3098 | 0.2425 | 
| 0.1627 | 125.0 | 82250 | 0.1522 | 0.7104 | 0.3184 | 0.2449 | 
| 0.1617 | 150.0 | 98700 | 0.1513 | 0.7130 | 0.3243 | 0.2449 | 
| 0.1612 | 175.0 | 115150 | 0.1504 | 0.7143 | 0.3285 | 0.2499 | 
| 0.1606 | 200.0 | 131600 | 0.1498 | 0.7161 | 0.3314 | 0.2515 | 
| 0.16 | 250.0 | 164500 | 0.1488 | 0.7183 | 0.3383 | 0.2538 | 
| 0.1592 | 300.0 | 197400 | 0.1482 | 0.7204 | 0.3423 | 0.2534 | 
| 0.1589 | 350.0 | 230300 | 0.1476 | 0.7214 | 0.3450 | 0.2581 | 
| 0.1584 | 400.0 | 263200 | 0.1474 | 0.7223 | 0.3459 | 0.2588 | 
| 0.1584 | 450.0 | 296100 | 0.1471 | 0.7231 | 0.3487 | 0.2588 | 
| 0.158 | 500.0 | 329000 | 0.1468 | 0.7232 | 0.3494 | 0.2612 | 
| 0.1577 | 550.0 | 361900 | 0.1467 | 0.7239 | 0.3503 | 0.2600 | 
| ... | ... | ... | ... | ... | ... | ... | 
| 0.1574 | 591.0 | 388878 | 0.1466 | 0.7243 | 0.3510 | 0.2596 | 
| 0.1576 | 592.0 | 389536 | 0.1465 | 0.7234 | 0.3496 | 0.2596 | 
| 0.1582 | 593.0 | 390194 | 0.1465 | 0.7239 | 0.3504 | 0.2592 | 
| 0.158 | 594.0 | 390852 | 0.1465 | 0.7240 | 0.3505 | 0.2588 | 
Framework versions
- Transformers 4.52.4
- Pytorch 2.7.1+cu128
- Datasets 3.6.0
- Tokenizers 0.21.2
