--- license: cc-by-4.0 language: - en metrics: - accuracy - f1 - recall - precision base_model: - xlnet/xlnet-base-cased tags: - xlnet - text-classification - privacy - trust - mobile-health - healthcare - harpt - finetuned-model --- # XLNet-base Fine-Tuned on HARPT **Model Name**: `XLNet-base-finetuned-HARPT` **Tags**: `xlnet`, `text-classification`, `privacy`, `trust`, `mobile-health`, `healthcare`, `harpt`, `custom-dataset`, `finetuned-model` **License**: *Creative Commons 4.0* --- ## ๐Ÿ“ Overview This is a fine-tuned version of [XLNet-base](https://huggingface.co/xlnet-base-cased) trained on the **HARPT** dataset โ€” a large-scale corpus of mobile health app reviews annotated with labels reflecting privacy and trust-related concerns. The model performs **single-label, multi-class classification** across seven expert-defined categories. --- ## ๐Ÿ“‚ Classes The model predicts one of the following seven categories: - `data_control` - `data_quality` - `risk` - `support` - `reliability` - `competence` - `ethicality` --- ## ๐Ÿ“Š Training Data Training was conducted on 7,000 manually annotated mobile health reviews from HARPT, using a balanced subset created via back-translation (BLEU = 30.43) and class downsampling. Annotation followed a multi-pass protocol with adjudication. --- ## ๐Ÿ” Intended Use - Analyzing trust and privacy concerns in app reviews - Supporting responsible AI research in digital health - Benchmarking NLP models on healthcare-oriented text classification --- ## ๐Ÿงช Evaluation Model performance on a held-out test set: | Metric | Score | |-----------|----------| | Accuracy | 86.71% | | F1 Score | 86.68% | | Precision | 86.81% | | Recall | 86.71% | --- ## ๐Ÿš€ Usage ```python from transformers import XLNetForSequenceClassification, XLNetTokenizerFast # Load the fine-tuned HARPT model and tokenizer model = XLNetForSequenceClassification.from_pretrained("tk648/XLNet-base-finetuned-HARPT") tokenizer = XLNetTokenizerFast.from_pretrained("tk648/XLNet-base-finetuned-HARPT") # Example review text = "This app keeps crashing when I try to schedule a consultation." # Tokenize and predict inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) outputs = model(**inputs) predicted_class = outputs.logits.argmax(dim=1).item() print("Predicted class ID:", predicted_class) ``` ## If you use this model, please cite: Timoteo Kelly, Abdulkadir Korkmaz, Samuel Mallet, Connor Souders, Sadra Aliakbarpour, and Praveen Rao. 2025. HARPT: A Corpus for Analyzing Consumersโ€™ Trust and Privacy Concerns in Mobile Health Apps. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKMโ€™25).