AiresPucrs
/

toxicity-classifier

Transformers

Keras

English

Model card Files Files and versions Community

nicholasKluge commited on Oct 13, 2024

Commit

7713f55

verified ·

1 Parent(s): 78b1f35

Update README.md

Browse files

Files changed (1) hide show

README.md +31 -66

README.md CHANGED Viewed

@@ -6,87 +6,52 @@ datasets:
 - AiresPucrs/toxic-comments
 library_name: transformers
 ---
-# Toxicity-classifier
-## Model Overview
-The toxicity classifier is used to differentiate between non-toxic and toxic comments.
-The model was trained with a dataset composed of toxic and non-toxic comments extracted from web forums.
-## Details
-- **Size:** 4,689,681 parameters
-- **Model type:** Transformer
-- **Number of Epochs:** 20
-- **Batch Size:** 16
-- **Optimizer:** Adam
-- **Learning Rate:** 0.001
-- **Hardware:** Tesla V4
-- **Emissions:** Not measured
-- **Total Energy Consumption:** Not measured
-## How to Use
-⚠️ THE EXAMPLES BELOW CONTAIN TOXIC/OFFENSIVE LANGUAGE ⚠️
-```python
-import tensorflow as tf
-toxicity_model = tf.keras.models.load_model('toxicity_model.keras')
-with open('toxic_vocabulary.txt', encoding='utf-8') as fp:
-    vocabulary = [line.strip() for line in fp]
-    fp.close()
 vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=20000,
-                                        output_mode="int",
-                                        output_sequence_length=100,
-                                        vocabulary=vocabulary)
 strings = [
-    'I think you should shut up your big mouth',
-    'I do not agree with you'
 ]
 preds = toxicity_model.predict(vectorization_layer(strings),verbose=0)
 for i, string in enumerate(strings):
-    print(f'{string}\n')
-    print(f'Toxic 🤬 {round((1 - preds[i][0]) * 100, 2)}% | Not toxic 😊 {round(preds[i][0] * 100, 2)}\n')
-    print("_" * 50)
-```
-This will output the following:
-```
-I think you should shut up your big mouth
-Toxic 🤬 95.73% | Not toxic 😊 4.27
-__________________________________________________
-I do not agree with you
-Toxic 🤬 0.99% | Not toxic 😊 99.01
-__________________________________________________
-```
-## Training Data
-- **Dataset:** [Toxic Comment Classification Challenge Dataset](https://huggingface.co/datasets/AiresPucrs/toxic-comments)
-## Cite as
-```latex
-@misc{teenytinycastle,
-    doi = {10.5281/zenodo.7112065},
-    url = {https://github.com/Nkluge-correa/teeny-tiny_castle},
-    author = {Nicholas Kluge Corr{\^e}a},
-    title = {Teeny-Tiny Castle},
-    year = {2024},
-    publisher = {GitHub},
-    journal = {GitHub repository},
-}
-```
-## License
-This model is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.

 - AiresPucrs/toxic-comments
 library_name: transformers
 ---
+# Toxicity Classifier (Teeny-Tiny Castle)
+This model is part of a tutorial tied to the [Teeny-Tiny Castle](https://github.com/Nkluge-correa/TeenyTinyCastle), an open-source repository containing educational tools for AI Ethics and Safety research.
+## How to Use
+```python
+from huggingface_hub import hf_hub_download
+# Download the model (this will be the target of our attack)
+hf_hub_download(repo_id="AiresPucrs/toxicity-classifier",
+                filename="toxicity-classifier/toxicity-model.keras",
+                local_dir="./",
+                repo_type="model"
+ )
+# Download the tokenizer file
+hf_hub_download(repo_id="AiresPucrs/toxicity-classifier",
+                filename="toxic-vocabulary.txt",
+                local_dir="./",
+                repo_type="model"
+ )
+toxicity_model = tf.keras.models.load_model('./toxicity-classifier/toxicity-model.keras')
+# If you cloned the model repo, the path is toxicity_model/toxic_vocabulary.txt
+with open('toxic-vocabulary.txt', encoding='utf-8') as fp:
+ vocabulary = [line.strip() for line in fp]
+ fp.close()
 vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=20000,
+                                        output_mode="int",
+                                        output_sequence_length=100,
+                                        vocabulary=vocabulary)
 strings = [
+    'I think you should shut up your big mouth',
+    'I do not agree with you'
 ]
 preds = toxicity_model.predict(vectorization_layer(strings),verbose=0)
 for i, string in enumerate(strings):
+    print(f'{string}\n')
+    print(f'Toxic 🤬 {(1 - preds[i][0]) * 100:.2f)}% | Not toxic 😊 {preds[i][0] * 100:.2f}\n')
+    print("_" * 50)
+```