PARENT BERT Models for Privacy Policy Analysis

This repository contains TorchScript versions of 15 fine-tuned BERT models used in the PARENT project to analyse mobile app privacy policies. These models identify what data is collected, why it is collected, and how it is processed, helping assess GDPR compliance.

They are part of a hybrid framework designed for non-technical users, particularly parents concerned about children’s privacy.

Model Purpose

Segment privacy policies to detect:
- Data collection types (e.g., contact info, location)
- Purpose of data collection
- How data is processed
Support GDPR compliance evaluation
Detect potential third-party sharing (in combination with a logistic regression model)

References

MAPP Dataset: Arora, S., Hosseini, H., Utz, C., Bannihatti Kumar, V., Dhellemmes, T., Ravichander, A., Story, P., Mangat, J., Chen, R., Degeling, M., Norton, T.B., Hupperich, T., Wilson, S., & Sadeh, N.M. (2022). A tale of two regulatory regimes: Creation and analysis of a bilingual privacy policy corpus. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2022). PDF link [Accessed 12 July 2025].

Usage

import torch
from transformers import BertTokenizerFast
from huggingface_hub import hf_hub_download

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
REPO_ID = "Bnaad/PARENT_bert"

# Load tokenizer
tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")

# Load one TorchScript model from Hugging Face
label_name = "Information Type_Contact information"
safe_label = label_name.replace(" ", "_").replace("/", "_")
filename = f"torchscript_{safe_label}.pt"
model_path = hf_hub_download(repo_id=REPO_ID, filename=filename)
model = torch.jit.load(model_path, map_location=device)
model.to(device)
model.eval()

# Example inference
sample_text = """For any questions about your account or our services, please contact our customer support team by emailing [email protected], calling +1-800-555-1234, or visiting our office at 123 Main Street, Springfield, IL, 62701 during business hours"""
inputs = tokenizer(
    sample_text, 
    return_tensors="pt", 
    truncation=True, 
    padding="max_length", 
    max_length=512
).to(device)

with torch.no_grad():
    outputs = model(inputs["input_ids"], inputs["attention_mask"])
    
print("Logits:", outputs)
prob = torch.sigmoid(outputs.squeeze())
print(prob)

Bnaad
/

PARENT_bert

PARENT BERT Models for Privacy Policy Analysis

Model Purpose

References

Usage

Evaluation results