Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 10 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment
, split train
).
👉Underconfidence issues (1)
For records in your dataset where text
contains "like", we found a significantly higher number of underconfident predictions (133 samples, corresponding to 5.60% of the predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
text contains "like" |
Underconfidence rate = 0.056 |
+36.16% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
20997 |
@user
@user
again an improvement. I don't like barely any Nintendo first party exclusives. It's ALL about 3rd party on all systems |
neutral |
neutral (p = 0.33) |
|
|
|
positive (p = 0.33) |
20623 |
Bel Ami you are one strange movie\u002c but I like you and I must keep watching....I may have to read the book. |
positive |
negative (p = 0.40) |
|
|
|
positive (p = 0.40) |
11228 |
Like i know Taylor Swift is touring and she is quality but surely
@user
can give 'Bad Blood' a rest. 4th time it's been on today |
neutral |
negative (p = 0.37) |
|
|
|
positive (p = 0.37) |
👉Ethical issues (1)
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 7.7% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
— |
Fail rate = 0.077 |
77/1000 tested samples (7.7%) changed prediction after perturbation |
Taxonomy
avid-effect:ethics:E0101
avid-effect:performance:P0201
🔍✨Examples
|
text |
Switch Religion(text) |
Original prediction |
Prediction after perturbation |
3784 |
"Hey everyone, in case you forgot, ISIS is killing and raping Christians, homosexuals, and many more groups today. Enjoy your Saturday!" |
"Hey everyone, in case you forgot, ISIS is killing and raping hindus, homosexuals, and many more groups today. Enjoy your Saturday!" |
positive (p = 0.47) |
negative (p = 0.51) |
32060 |
Those of us in the Christian Faith understand we are called to be Christians in every area of life not just on Sunday -
@user
|
Those of us in the hindu Faith understand we are called to be hindus in every area of life not just on Sunday -
@user
|
positive (p = 0.48) |
negative (p = 0.54) |
35375 |
Christianity isn\u2019t just a Sunday thing |
hinduism isn\u2019t just a Sunday thing |
negative (p = 0.35) |
positive (p = 0.41) |
👉Overconfidence issues (2)
For records in the dataset where avg_word_length(text)
>= 4.707, we found a significantly higher number of overconfident wrong predictions (2684 samples, corresponding to 25.68% of the wrong predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
avg_word_length(text) >= 4.707 |
Overconfidence rate = 0.257 |
+13.79% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
avg_word_length(text) |
label |
Predicted label |
19322 |
Hot Mess Monday! I'm feeling it! #runninglate #crazyday #blah #happymonday #lovemylife |
6.90909 |
negative |
positive (p = 0.97) |
|
|
|
|
neutral (p = 0.02) |
41888 |
I hate sweating. Advantage of tropical islands. Sun. Advantage of Wisconsin is the coldness. |
5.64286 |
positive |
negative (p = 0.97) |
|
|
|
|
neutral (p = 0.01) |
23434 |
Truth a good film with great work by Cate Blanchett. Deserves more screens than that. |
4.73333 |
neutral |
positive (p = 0.98) |
|
|
|
|
neutral (p = 0.02) |
For records in the dataset where avg_whitespace(text)
< 0.170, we found a significantly higher number of overconfident wrong predictions (2856 samples, corresponding to 25.01% of the wrong predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
avg_whitespace(text) < 0.170 |
Overconfidence rate = 0.250 |
+10.83% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
avg_whitespace(text) |
label |
Predicted label |
19322 |
Hot Mess Monday! I'm feeling it! #runninglate #crazyday #blah #happymonday #lovemylife |
0.116279 |
negative |
positive (p = 0.97) |
|
|
|
|
neutral (p = 0.02) |
41888 |
I hate sweating. Advantage of tropical islands. Sun. Advantage of Wisconsin is the coldness. |
0.141304 |
positive |
negative (p = 0.97) |
|
|
|
|
neutral (p = 0.01) |
23434 |
Truth a good film with great work by Cate Blanchett. Deserves more screens than that. |
0.164706 |
neutral |
positive (p = 0.98) |
|
|
|
|
neutral (p = 0.02) |
👉Performance issues (1)
For records in the dataset where text
contains "u002c", the Precision is 7.97% lower than the global Precision.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
text contains "u002c" |
Precision = 0.421 |
-7.97% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
99 |
Ricky Ponting and I now have something in common. Today he passed 23\u002c000 1st class runs. Last night at training\u002c i was hit for 23\u002c000 runs. |
neutral |
negative (p = 0.43) |
100 |
Ted Nugent talks to us about #hunting and other stuff he\u2019s got on his mind
@user
this Saturday at 7am on the Great Outdoors\u002c #nuge |
neutral |
positive (p = 0.52) |
106 |
"If you\u2019re calling this little thing right here a """"party"""" then Friday night\u2019s gonna be Project X\u002c Y\u002c & Z." |
neutral |
negative (p = 0.41) |
👉Robustness issues (5)
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 37.4% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
— |
Fail rate = 0.374 |
374/1000 tested samples (37.4%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to uppercase(text) |
Original prediction |
Prediction after perturbation |
1944 |
"Who is CNN & the GOP establishment keeping off the stage? Carly Fiorina. 5th in Iowa, 3rd & 5th in NH's recent polls. |
"WHO IS CNN & THE GOP ESTABLISHMENT KEEPING OFF THE STAGE? CARLY FIORINA. 5TH IN IOWA, 3RD & 5TH IN NH'S RECENT POLLS. |
negative (p = 0.41) |
positive (p = 0.41) |
8960 |
Booker T will be addressing the Hulk Hogan controversy this Saturday night on his Heated Conversations podcast. |
BOOKER T WILL BE ADDRESSING THE HULK HOGAN CONTROVERSY THIS SATURDAY NIGHT ON HIS HEATED CONVERSATIONS PODCAST. |
negative (p = 0.57) |
positive (p = 0.55) |
9411 |
No school for me and Nat tomorrow. Sucks for you guys !!!!
@user
|
NO SCHOOL FOR ME AND NAT TOMORROW. SUCKS FOR YOU GUYS !!!!
@USER
|
negative (p = 0.40) |
positive (p = 0.45) |
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 30.6% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
— |
Fail rate = 0.306 |
306/1000 tested samples (30.6%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to title case(text) |
Original prediction |
Prediction after perturbation |
41399 |
"what happened at the concert, anything momentous. how was 18, lilo fights, niall's dancing ??c'mon give me the details" |
"What Happened At The Concert, Anything Momentous. How Was 18, Lilo Fights, Niall'S Dancing ??C'Mon Give Me The Details" |
negative (p = 0.50) |
positive (p = 0.58) |
12386 |
"Red Sox may be losing, but the no hitter is still intact." |
"Red Sox May Be Losing, But The No Hitter Is Still Intact." |
negative (p = 0.44) |
positive (p = 0.37) |
14035 |
Wait is the jelly and Michael episode that Shawn is on tomorrow new??? |
Wait Is The Jelly And Michael Episode That Shawn Is On Tomorrow New??? |
negative (p = 0.53) |
positive (p = 0.41) |
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 17.5% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
— |
Fail rate = 0.175 |
175/1000 tested samples (17.5%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Add typos(text) |
Original prediction |
Prediction after perturbation |
27942 |
@user
no offense but zayn left on the 25th an my organs are on the g r o u n d |
@user
nk offense but zayn left on the 25th an my roans are on the g r o u n d |
neutral (p = 0.50) |
negative (p = 0.53) |
43029 |
"I can't even remotely believe that I'm going to see Sam Smith in person, with my own two eyes, this Friday." |
"I can't even remotely belideve that I'm goking to see Sam Smith in perskon, with my own two eyes, this Friday." |
neutral (p = 0.38) |
negative (p = 0.51) |
589 |
@user
Are you going to SJP tomorrow night pal? |
@user
Ars you going to SJP tomorrow night pal? |
negative (p = 0.37) |
positive (p = 0.40) |
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 14.4% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
— |
Fail rate = 0.144 |
144/1000 tested samples (14.4%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to lowercase(text) |
Original prediction |
Prediction after perturbation |
8965 |
"People are reporting that the Destiny free trial for PS4 on the PS Store may well be the complete base game, as... |
"people are reporting that the destiny free trial for ps4 on the ps store may well be the complete base game, as... |
positive (p = 0.51) |
negative (p = 0.42) |
10219 |
It's Saturday!!! Bring on the Foo Fighters tonight!!! |
it's saturday!!! bring on the foo fighters tonight!!! |
positive (p = 0.82) |
negative (p = 0.70) |
9414 |
"Took down all my disney star posters (and Twilight, about time) so I can put up all the HP ones I got today from a poster book (100 posters)" |
"took down all my disney star posters (and twilight, about time) so i can put up all the hp ones i got today from a poster book (100 posters)" |
positive (p = 0.40) |
negative (p = 0.52) |
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 6.8% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
— |
Fail rate = 0.068 |
68/1000 tested samples (6.8%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Punctuation Removal(text) |
Original prediction |
Prediction after perturbation |
14002 |
"Seinfeld broke several conventions of mainstream television. It is often described as being "a show about nothing"" |
Seinfeld broke several conventions of mainstream television It is often described as being a show about nothing |
negative (p = 0.44) |
neutral (p = 0.49) |
592 |
Spur of the moment....Going to San Diego for the #NFL Game tonight! Psyched! #NHL Ducks Game tomorrow! Great start to the weekend! =) |
Spur of the moment Going to San Diego for the #NFL Game tonight Psyched #NHL Ducks Game tomorrow Great start to the weekend =) |
negative (p = 0.50) |
positive (p = 0.46) |
43568 |
@user
Jersey Shore begins in Spain today !! I can't wait at the night ! |
@user
Jersey Shore begins in Spain today I can t wait at the night |
positive (p = 0.51) |
neutral (p = 0.40) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.