Report for lxyuan/distilbert-base-multilingual-cased-sentiments-student

#89
by giskard-bot - opened

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 10 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split train).

👉Underconfidence issues (1)

For records in your dataset where text contains "like", we found a significantly higher number of underconfident predictions (133 samples, corresponding to 5.60% of the predictions in the data slice).

Level Data slice Metric Deviation
major 🔴 text contains "like" Underconfidence rate = 0.056 +36.16% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text label Predicted label
20997 @user @user again an improvement. I don't like barely any Nintendo first party exclusives. It's ALL about 3rd party on all systems neutral neutral (p = 0.33)
positive (p = 0.33)
20623 Bel Ami you are one strange movie\u002c but I like you and I must keep watching....I may have to read the book. positive negative (p = 0.40)
positive (p = 0.40)
11228 Like i know Taylor Swift is touring and she is quality but surely @user can give 'Bad Blood' a rest. 4th time it's been on today neutral negative (p = 0.37)
positive (p = 0.37)
👉Ethical issues (1)

When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 7.7% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.077 77/1000 tested samples (7.7%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch Religion(text) Original prediction Prediction after perturbation
3784 "Hey everyone, in case you forgot, ISIS is killing and raping Christians, homosexuals, and many more groups today. Enjoy your Saturday!" "Hey everyone, in case you forgot, ISIS is killing and raping hindus, homosexuals, and many more groups today. Enjoy your Saturday!" positive (p = 0.47) negative (p = 0.51)
32060 Those of us in the Christian Faith understand we are called to be Christians in every area of life not just on Sunday - @user Those of us in the hindu Faith understand we are called to be hindus in every area of life not just on Sunday - @user positive (p = 0.48) negative (p = 0.54)
35375 Christianity isn\u2019t just a Sunday thing hinduism isn\u2019t just a Sunday thing negative (p = 0.35) positive (p = 0.41)
👉Overconfidence issues (2)

For records in the dataset where avg_word_length(text) >= 4.707, we found a significantly higher number of overconfident wrong predictions (2684 samples, corresponding to 25.68% of the wrong predictions in the data slice).

Level Data slice Metric Deviation
medium 🟡 avg_word_length(text) >= 4.707 Overconfidence rate = 0.257 +13.79% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text avg_word_length(text) label Predicted label
19322 Hot Mess Monday! I'm feeling it! #runninglate #crazyday #blah #happymonday #lovemylife 6.90909 negative positive (p = 0.97)
neutral (p = 0.02)
41888 I hate sweating. Advantage of tropical islands. Sun. Advantage of Wisconsin is the coldness. 5.64286 positive negative (p = 0.97)
neutral (p = 0.01)
23434 Truth a good film with great work by Cate Blanchett. Deserves more screens than that. 4.73333 neutral positive (p = 0.98)
neutral (p = 0.02)

For records in the dataset where avg_whitespace(text) < 0.170, we found a significantly higher number of overconfident wrong predictions (2856 samples, corresponding to 25.01% of the wrong predictions in the data slice).

Level Data slice Metric Deviation
medium 🟡 avg_whitespace(text) < 0.170 Overconfidence rate = 0.250 +10.83% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text avg_whitespace(text) label Predicted label
19322 Hot Mess Monday! I'm feeling it! #runninglate #crazyday #blah #happymonday #lovemylife 0.116279 negative positive (p = 0.97)
neutral (p = 0.02)
41888 I hate sweating. Advantage of tropical islands. Sun. Advantage of Wisconsin is the coldness. 0.141304 positive negative (p = 0.97)
neutral (p = 0.01)
23434 Truth a good film with great work by Cate Blanchett. Deserves more screens than that. 0.164706 neutral positive (p = 0.98)
neutral (p = 0.02)
👉Performance issues (1)

For records in the dataset where text contains "u002c", the Precision is 7.97% lower than the global Precision.

Level Data slice Metric Deviation
medium 🟡 text contains "u002c" Precision = 0.421 -7.97% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text label Predicted label
99 Ricky Ponting and I now have something in common. Today he passed 23\u002c000 1st class runs. Last night at training\u002c i was hit for 23\u002c000 runs. neutral negative (p = 0.43)
100 Ted Nugent talks to us about #hunting and other stuff he\u2019s got on his mind @user this Saturday at 7am on the Great Outdoors\u002c #nuge neutral positive (p = 0.52)
106 "If you\u2019re calling this little thing right here a """"party"""" then Friday night\u2019s gonna be Project X\u002c Y\u002c & Z." neutral negative (p = 0.41)
👉Robustness issues (5)

When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 37.4% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.374 374/1000 tested samples (37.4%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to uppercase(text) Original prediction Prediction after perturbation
1944 "Who is CNN & the GOP establishment keeping off the stage? Carly Fiorina. 5th in Iowa, 3rd & 5th in NH's recent polls. "WHO IS CNN & THE GOP ESTABLISHMENT KEEPING OFF THE STAGE? CARLY FIORINA. 5TH IN IOWA, 3RD & 5TH IN NH'S RECENT POLLS. negative (p = 0.41) positive (p = 0.41)
8960 Booker T will be addressing the Hulk Hogan controversy this Saturday night on his Heated Conversations podcast. BOOKER T WILL BE ADDRESSING THE HULK HOGAN CONTROVERSY THIS SATURDAY NIGHT ON HIS HEATED CONVERSATIONS PODCAST. negative (p = 0.57) positive (p = 0.55)
9411 No school for me and Nat tomorrow. Sucks for you guys !!!! @user NO SCHOOL FOR ME AND NAT TOMORROW. SUCKS FOR YOU GUYS !!!! @USER negative (p = 0.40) positive (p = 0.45)

When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 30.6% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.306 306/1000 tested samples (30.6%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to title case(text) Original prediction Prediction after perturbation
41399 "what happened at the concert, anything momentous. how was 18, lilo fights, niall's dancing ??c'mon give me the details" "What Happened At The Concert, Anything Momentous. How Was 18, Lilo Fights, Niall'S Dancing ??C'Mon Give Me The Details" negative (p = 0.50) positive (p = 0.58)
12386 "Red Sox may be losing, but the no hitter is still intact." "Red Sox May Be Losing, But The No Hitter Is Still Intact." negative (p = 0.44) positive (p = 0.37)
14035 Wait is the jelly and Michael episode that Shawn is on tomorrow new??? Wait Is The Jelly And Michael Episode That Shawn Is On Tomorrow New??? negative (p = 0.53) positive (p = 0.41)

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 17.5% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.175 175/1000 tested samples (17.5%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Add typos(text) Original prediction Prediction after perturbation
27942 @user no offense but zayn left on the 25th an my organs are on the g r o u n d @user nk offense but zayn left on the 25th an my roans are on the g r o u n d neutral (p = 0.50) negative (p = 0.53)
43029 "I can't even remotely believe that I'm going to see Sam Smith in person, with my own two eyes, this Friday." "I can't even remotely belideve that I'm goking to see Sam Smith in perskon, with my own two eyes, this Friday." neutral (p = 0.38) negative (p = 0.51)
589 @user Are you going to SJP tomorrow night pal? @user Ars you going to SJP tomorrow night pal? negative (p = 0.37) positive (p = 0.40)

When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 14.4% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.144 144/1000 tested samples (14.4%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to lowercase(text) Original prediction Prediction after perturbation
8965 "People are reporting that the Destiny free trial for PS4 on the PS Store may well be the complete base game, as... "people are reporting that the destiny free trial for ps4 on the ps store may well be the complete base game, as... positive (p = 0.51) negative (p = 0.42)
10219 It's Saturday!!! Bring on the Foo Fighters tonight!!! it's saturday!!! bring on the foo fighters tonight!!! positive (p = 0.82) negative (p = 0.70)
9414 "Took down all my disney star posters (and Twilight, about time) so I can put up all the HP ones I got today from a poster book (100 posters)" "took down all my disney star posters (and twilight, about time) so i can put up all the hp ones i got today from a poster book (100 posters)" positive (p = 0.40) negative (p = 0.52)

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 6.8% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.068 68/1000 tested samples (6.8%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Punctuation Removal(text) Original prediction Prediction after perturbation
14002 "Seinfeld broke several conventions of mainstream television. It is often described as being "a show about nothing"" Seinfeld broke several conventions of mainstream television It is often described as being a show about nothing negative (p = 0.44) neutral (p = 0.49)
592 Spur of the moment....Going to San Diego for the #NFL Game tonight! Psyched! #NHL Ducks Game tomorrow! Great start to the weekend! =) Spur of the moment Going to San Diego for the #NFL Game tonight Psyched #NHL Ducks Game tomorrow Great start to the weekend =) negative (p = 0.50) positive (p = 0.46)
43568 @user Jersey Shore begins in Spain today !! I can't wait at the night ! @user Jersey Shore begins in Spain today I can t wait at the night positive (p = 0.51) neutral (p = 0.40)

Checkout out the Giskard Space and test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

Sign up or log in to comment