Update README.md
Browse files
README.md
CHANGED
@@ -1,11 +1,13 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
-
license:
|
4 |
language:
|
5 |
- en
|
6 |
metrics:
|
7 |
- accuracy
|
8 |
pipeline_tag: text-classification
|
|
|
|
|
9 |
---
|
10 |
|
11 |
# Model Card for Model ID
|
@@ -34,11 +36,9 @@ This model can be used for whatever reason you need, also a site hosted, based o
|
|
34 |
## Bias, Risks, and Limitations
|
35 |
|
36 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
it might also be biased toward certain names.
|
41 |
-
|
42 |
### Recommendations
|
43 |
|
44 |
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
@@ -56,7 +56,7 @@ pipe.predict('Some text')
|
|
56 |
|
57 |
It will return something like this:
|
58 |
[{'label': 'LABEL_0', 'score': 0.7248537290096283}]
|
59 |
-
Where 'LABEL_0' means false and score
|
60 |
|
61 |
### Training Data
|
62 |
|
@@ -65,7 +65,8 @@ https://huggingface.co/datasets/GonzaloA/fake_news
|
|
65 |
https://github.com/GeorgeMcIntire/fake_real_news_dataset
|
66 |
|
67 |
#### Preprocessing
|
68 |
-
Preprocessing was made by using this function
|
|
|
69 |
```
|
70 |
import re
|
71 |
import string
|
@@ -124,13 +125,6 @@ The following hyperparameters were used during training:
|
|
124 |
- weight_decay: 0.03
|
125 |
- random seed: 42
|
126 |
|
127 |
-
#### Speeds, Sizes, Times [optional]
|
128 |
-
|
129 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
130 |
-
|
131 |
-
[More Information Needed]
|
132 |
-
|
133 |
-
|
134 |
### Testing Data, Metrics
|
135 |
|
136 |
#### Testing Data
|
@@ -184,4 +178,4 @@ weighted avg 0.9731 0.9731 0.9731 19996
|
|
184 |
|
185 |
#### Hardware
|
186 |
|
187 |
-
Tesla T4 GPU, available for free in Google Collab
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
license: mit
|
4 |
language:
|
5 |
- en
|
6 |
metrics:
|
7 |
- accuracy
|
8 |
pipeline_tag: text-classification
|
9 |
+
tags:
|
10 |
+
- fake news
|
11 |
---
|
12 |
|
13 |
# Model Card for Model ID
|
|
|
36 |
## Bias, Risks, and Limitations
|
37 |
|
38 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
39 |
+
Since it's a Bert model, it also exhibits bias. It wouldn't be classified as cutting-edge because it was trained
|
40 |
+
on outdated data (pre-2022). This makes it unreliable for fact-checking fake news related to military conflicts in Ukraine,
|
41 |
+
Israel, etc. Additionally, the lack of preprocessing for people's names in the data might introduce a bias towards certain names.
|
|
|
|
|
42 |
### Recommendations
|
43 |
|
44 |
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
|
|
56 |
|
57 |
It will return something like this:
|
58 |
[{'label': 'LABEL_0', 'score': 0.7248537290096283}]
|
59 |
+
Where 'LABEL_0' means false and 'score' stands for the probability of it.
|
60 |
|
61 |
### Training Data
|
62 |
|
|
|
65 |
https://github.com/GeorgeMcIntire/fake_real_news_dataset
|
66 |
|
67 |
#### Preprocessing
|
68 |
+
Preprocessing was made by using this function. Note that the data, tested below, was not truncated to
|
69 |
+
12 >= len(new_filtered_words) >= 6, but it has still been pre-processed.
|
70 |
```
|
71 |
import re
|
72 |
import string
|
|
|
125 |
- weight_decay: 0.03
|
126 |
- random seed: 42
|
127 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
128 |
### Testing Data, Metrics
|
129 |
|
130 |
#### Testing Data
|
|
|
178 |
|
179 |
#### Hardware
|
180 |
|
181 |
+
Tesla T4 GPU, available for free in Google Collab
|