Spaces:
Running
Report for citizenlab/twitter-xlm-roberta-base-sentiment-finetunned
Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 6 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english
, split validation
).
👉Performance issues (2)
For records in the dataset where text
contains "time", the Precision is 40.94% lower than the global Precision.
Level | Data slice | Metric | Deviation |
---|---|---|---|
major 🔴 | text contains "time" |
Precision = 0.350 | -40.94% than global |
Taxonomy
avid-effect:performance:P0204🔍✨Examples
text | label | Predicted label |
|
---|---|---|---|
0 | @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. | Negative | Neutral (p = 0.97) |
35 | "According to Janet Jackson's long time producer Terry Lewis, the album is due in October. STAY CONNECTED!... | Positive | Neutral (p = 0.98) |
65 | Jay-Z sat in that Interview like a God showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown | Positive | Neutral (p = 0.96) |
For records in the dataset where text
contains "tomorrow", the Precision is 8.22% lower than the global Precision.
Level | Data slice | Metric | Deviation |
---|---|---|---|
medium 🟡 | text contains "tomorrow" |
Precision = 0.544 | -8.22% than global |
Taxonomy
avid-effect:performance:P0204🔍✨Examples
text | label | Predicted label |
|
---|---|---|---|
62 | But it's a three day weekend and we see Ed Sheeran tomorrow (!!!!!) so things miiiight be looking up. | Positive | Neutral (p = 0.99) |
68 | When I wake up tomorrow I'll be in a different country. Whoa! I didn't run into a David Beckham at the airport. That's a bummer. | Positive | Negative (p = 0.96) |
71 | CINCH YOUR SADDLE is live on Amazon! Only 99 cents until tomorrow evening.Thank you gift! | Positive | Neutral (p = 0.87) |
👉Robustness issues (4)
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 15.43% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
major 🔴 | — | Fail rate = 0.154 | 50/324 tested samples (15.43%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
2 | Hold on... Sam Smith may do the theme to Spectre!? Dope!!!!!! #007 #SPECTRE #JamesBond | HOLD ON... SAM SMITH MAY DO THE THEME TO SPECTRE!? DOPE!!!!!! #007 #SPECTRE #JAMESBOND | Positive (p = 0.98) | Neutral (p = 0.99) |
4 | Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S | GONNA WATCH FINAL DESTINATION 5 TONIGHT. I ALWAYS LEAVE THE THEATER SO AFRAID OF EVERYTHING. NO HUGE ESCALATORS FOR SURE :S | Neutral (p = 0.81) | Negative (p = 0.68) |
6 | @user @user Islam is an Abrahamic faith, Andrew. It may make you feel a little uneasy but it's the same God you worship. Sorry." | @USER @USER ISLAM IS AN ABRAHAMIC FAITH, ANDREW. IT MAY MAKE YOU FEEL A LITTLE UNEASY BUT IT'S THE SAME GOD YOU WORSHIP. SORRY." | Neutral (p = 0.96) | Negative (p = 0.85) |
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 10.26% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
major 🔴 | — | Fail rate = 0.103 | 32/312 tested samples (10.26%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
7 | Harper's Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East | Harper's Worst Offense against Refugees mzy be Climate Recor sas rising temperatures ad to chaos in the Middle East | Negative (p = 0.63) | Neutral (p = 0.50) |
20 | Sharknado 3 may be the best film I've seen yet. #Sharknado3 #America | Sharknado 3 may be the bext film I've seen yet. #Sharknado3 #America | Positive (p = 0.98) | Neutral (p = 0.98) |
21 | Celebrity Big Brother: Daniel's eviction stirs up bad feelings in the house: Daniel Baldwin may have left the ... | Celebrity Buig Brother: Daniel's viction stirs yup bad felinhgs int he house: Daniel Baldwin may have left tnhe ... | Negative (p = 0.80) | Neutral (p = 0.99) |
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 8.64% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
medium 🟡 | — | Fail rate = 0.086 | 28/324 tested samples (8.64%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
4 | Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S | Gonna Watch Final Destination 5 Tonight. I Always Leave The Theater So Afraid Of Everything. No Huge Escalators For Sure :S | Neutral (p = 0.81) | Negative (p = 0.61) |
15 | "More like boring eagles""""""""@Tunnyking: C'mon bro, Go out and support the Super Eagles #RT @user I hate international breaks" | "More Like Boring Eagles""""""""@Tunnyking: C'Mon Bro, Go Out And Support The Super Eagles #Rt @User I Hate International Breaks" | Negative (p = 0.84) | Neutral (p = 0.59) |
21 | Celebrity Big Brother: Daniel's eviction stirs up bad feelings in the house: Daniel Baldwin may have left the ... | Celebrity Big Brother: Daniel'S Eviction Stirs Up Bad Feelings In The House: Daniel Baldwin May Have Left The ... | Negative (p = 0.80) | Neutral (p = 0.73) |
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 7.69% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
medium 🟡 | — | Fail rate = 0.077 | 23/299 tested samples (7.69%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
2 | Hold on... Sam Smith may do the theme to Spectre!? Dope!!!!!! #007 #SPECTRE #JamesBond | Hold on Sam Smith may do the theme to Spectre Dope #007 #SPECTRE #JamesBond | Positive (p = 0.98) | Neutral (p = 0.99) |
7 | Harper's Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East | Harper s Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East | Negative (p = 0.63) | Neutral (p = 0.51) |
26 | "this adorable old couple in dunkin literally made my day, he's turning 89 tomorrow and talked to me about how he was drafted for the WWII" | this adorable old couple in dunkin literally made my day he s turning 89 tomorrow and talked to me about how he was drafted for the WWII | Positive (p = 0.58) | Neutral (p = 0.69) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.