---
license: cc-by-4.0
language:
- en
metrics:
- accuracy
- f1
- recall
- precision
base_model:
- xlnet/xlnet-base-cased
tags:
- xlnet
- text-classification
- privacy
- trust
- mobile-health
- healthcare
- harpt
- finetuned-model
---


# XLNet-base Fine-Tuned on HARPT

**Model Name**: `XLNet-base-finetuned-HARPT`  
**Tags**: `xlnet`, `text-classification`, `privacy`, `trust`, `mobile-health`, `healthcare`, `harpt`, `custom-dataset`, `finetuned-model`  
**License**: *Creative Commons 4.0*

---

## 📝 Overview

This is a fine-tuned version of [XLNet-base](https://huggingface.co/xlnet-base-cased) trained on the **HARPT** dataset — a large-scale corpus of mobile health app reviews annotated with labels reflecting privacy and trust-related concerns. The model performs **single-label, multi-class classification** across seven expert-defined categories.

---

## 📂 Classes

The model predicts one of the following seven categories:

- `data_control`
- `data_quality`
- `risk`
- `support`
- `reliability`
- `competence`
- `ethicality`

---

## 📊 Training Data

Training was conducted on 7,000 manually annotated mobile health reviews from HARPT, using a balanced subset created via back-translation (BLEU = 30.43) and class downsampling. Annotation followed a multi-pass protocol with adjudication.

---

## 🔍 Intended Use

- Analyzing trust and privacy concerns in app reviews
- Supporting responsible AI research in digital health
- Benchmarking NLP models on healthcare-oriented text classification

---

## 🧪 Evaluation

Model performance on a held-out test set:

| Metric    | Score    |
|-----------|----------|
| Accuracy  | 86.71%   |
| F1 Score  | 86.68%   |
| Precision | 86.81%   |
| Recall    | 86.71%   |

---

## 🚀 Usage

```python
from transformers import XLNetForSequenceClassification, XLNetTokenizerFast

# Load the fine-tuned HARPT model and tokenizer
model = XLNetForSequenceClassification.from_pretrained("tk648/XLNet-base-finetuned-HARPT")
tokenizer = XLNetTokenizerFast.from_pretrained("tk648/XLNet-base-finetuned-HARPT")

# Example review
text = "This app keeps crashing when I try to schedule a consultation."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=1).item()

print("Predicted class ID:", predicted_class)
```

## If you use this model, please cite:

<small><em>
Timoteo Kelly, Abdulkadir Korkmaz, Samuel Mallet, Connor Souders, Sadra Aliakbarpour, and Praveen Rao. 2025.  
HARPT: A Corpus for Analyzing Consumers’ Trust and Privacy Concerns in Mobile Health Apps. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM’25).
</em></small>