Bnaad commited on
Commit
13aea0b
·
verified ·
1 Parent(s): 4723c43

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -3
README.md CHANGED
@@ -1,3 +1,94 @@
1
- ---
2
- license: unknown
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: transformers
5
+ tags:
6
+ - bert
7
+ - text-classification
8
+ - privacy-policy
9
+ - gdpr
10
+ - torchscript
11
+ datasets:
12
+ - MAPP-116
13
+ metrics:
14
+ - f1
15
+ model-index:
16
+ - name: PARENT BERT
17
+ results:
18
+ - task:
19
+ type: text-classification
20
+ dataset:
21
+ name: MAPP-116
22
+ type: text
23
+ metrics:
24
+ - name: f1
25
+ type: score
26
+ value: 0.80 # replace with your actual F1 score
27
+ ---
28
+
29
+
30
+
31
+
32
+ # PARENT BERT Models for Privacy Policy Analysis
33
+
34
+ This repository contains **TorchScript versions of 15 fine-tuned BERT models** used in the PARENT project to analyse mobile app privacy policies. These models identify **what data is collected, why it is collected, and how it is processed**, helping assess GDPR compliance.
35
+
36
+ They are part of a hybrid framework designed for non-technical users, particularly parents concerned about children’s privacy.
37
+
38
+ ---
39
+
40
+ ## Model Purpose
41
+
42
+ - Segment privacy policies to detect:
43
+ - Data collection types (e.g., contact info, location)
44
+ - Purpose of data collection
45
+ - How data is processed
46
+ - Support GDPR compliance evaluation
47
+ - Detect potential third-party sharing (in combination with a logistic regression model)
48
+
49
+ ---
50
+ ## References
51
+
52
+ - **MAPP Dataset:** Arora, S., Hosseini, H., Utz, C., Bannihatti Kumar, V., Dhellemmes, T., Ravichander, A., Story, P., Mangat, J., Chen, R., Degeling, M., Norton, T.B., Hupperich, T., Wilson, S., & Sadeh, N.M. (2022). *A tale of two regulatory regimes: Creation and analysis of a bilingual privacy policy corpus*. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2022). [PDF link](https://aclanthology.org/2022.lrec-1.585.pdf) [Accessed 12 July 2025].
53
+ ---
54
+
55
+ ## Usage
56
+
57
+ ```python
58
+ import torch
59
+ from transformers import BertTokenizerFast
60
+ from huggingface_hub import hf_hub_download
61
+
62
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
63
+ REPO_ID = "Bnaad/PARENT_bert"
64
+
65
+ # Load tokenizer
66
+ tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
67
+
68
+ # Load one TorchScript model from Hugging Face
69
+ label_name = "Information Type_Contact information"
70
+ safe_label = label_name.replace(" ", "_").replace("/", "_")
71
+ filename = f"torchscript_{safe_label}.pt"
72
+ model_path = hf_hub_download(repo_id=REPO_ID, filename=filename)
73
+ model = torch.jit.load(model_path, map_location=device)
74
+ model.to(device)
75
+ model.eval()
76
+
77
+ # Example inference
78
+ sample_text = """For any questions about your account or our services, please contact our customer support team by emailing [email protected], calling +1-800-555-1234, or visiting our office at 123 Main Street, Springfield, IL, 62701 during business hours"""
79
+ inputs = tokenizer(
80
+ sample_text,
81
+ return_tensors="pt",
82
+ truncation=True,
83
+ padding="max_length",
84
+ max_length=512
85
+ ).to(device)
86
+
87
+ with torch.no_grad():
88
+ outputs = model(inputs["input_ids"], inputs["attention_mask"])
89
+
90
+ print("Logits:", outputs)
91
+ prob = torch.sigmoid(outputs.squeeze())
92
+ print(prob)
93
+
94
+