Bnaad
/

PARENT_bert

+---
+language: en
+license: apache-2.0
+library_name: transformers
+tags:
+  - bert
+  - text-classification
+  - privacy-policy
+  - gdpr
+  - torchscript
+datasets:
+  - MAPP-116
+metrics:
+  - f1
+model-index:
+  - name: PARENT BERT
+    results:
+      - task:
+          type: text-classification
+        dataset:
+          name: MAPP-116
+          type: text
+        metrics:
+          - name: f1
+            type: score
+            value: 0.80  # replace with your actual F1 score
+---
+# PARENT BERT Models for Privacy Policy Analysis
+This repository contains **TorchScript versions of 15 fine-tuned BERT models** used in the PARENT project to analyse mobile app privacy policies. These models identify **what data is collected, why it is collected, and how it is processed**, helping assess GDPR compliance.
+They are part of a hybrid framework designed for non-technical users, particularly parents concerned about children’s privacy.
+---
+## Model Purpose
+- Segment privacy policies to detect:
+  - Data collection types (e.g., contact info, location)
+  - Purpose of data collection
+  - How data is processed
+- Support GDPR compliance evaluation
+- Detect potential third-party sharing (in combination with a logistic regression model)
+---
+##  References
+- **MAPP Dataset:** Arora, S., Hosseini, H., Utz, C., Bannihatti Kumar, V., Dhellemmes, T., Ravichander, A., Story, P., Mangat, J., Chen, R., Degeling, M., Norton, T.B., Hupperich, T., Wilson, S., & Sadeh, N.M. (2022). *A tale of two regulatory regimes: Creation and analysis of a bilingual privacy policy corpus*. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2022). [PDF link](https://aclanthology.org/2022.lrec-1.585.pdf) [Accessed 12 July 2025].
+---
+##  Usage
+```python
+import torch
+from transformers import BertTokenizerFast
+from huggingface_hub import hf_hub_download
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+REPO_ID = "Bnaad/PARENT_bert"
+# Load tokenizer
+tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
+# Load one TorchScript model from Hugging Face
+label_name = "Information Type_Contact information"
+safe_label = label_name.replace(" ", "_").replace("/", "_")
+filename = f"torchscript_{safe_label}.pt"
+model_path = hf_hub_download(repo_id=REPO_ID, filename=filename)
+model = torch.jit.load(model_path, map_location=device)
+model.to(device)
+model.eval()
+# Example inference
+sample_text = """For any questions about your account or our services, please contact our customer support team by emailing [email protected], calling +1-800-555-1234, or visiting our office at 123 Main Street, Springfield, IL, 62701 during business hours"""
+inputs = tokenizer(
+    sample_text,
+    return_tensors="pt",
+    truncation=True,
+    padding="max_length",
+    max_length=512
+).to(device)
+with torch.no_grad():
+    outputs = model(inputs["input_ids"], inputs["attention_mask"])
+print("Logits:", outputs)
+prob = torch.sigmoid(outputs.squeeze())
+print(prob)