Spaces:
Running
Report for CouchCat/ma_sa_v7_distil
Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 5 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english
, split validation
).
👉Robustness issues (2)
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 14.42% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
major 🔴 | — | Fail rate = 0.144 | 45/312 tested samples (14.42%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
6 | @user @user Islam is an Abrahamic faith, Andrew. It may make you feel a little uneasy but it's the same God you worship. Sorry." | @user @user Islam jis an Abrahamic faith, Adnrew. It may make you feel a little unesay btu it's the same God you worsip. Sorry." | negative (p = 0.58) | neutral (p = 0.85) |
15 | "More like boring eagles""""""""@Tunnyking: C'mon bro, Go out and support the Super Eagles #RT @user I hate international breaks" | "More like borinhg eagles""""""""@Tunnyking: C'mon bro, Go out and support the Wuper Eagles #RT @use I hate internxtional brdaks" | positive (p = 0.75) | neutral (p = 0.97) |
16 | "The BAGRANGI new Pic,Of SALMAN khan That VERY FAMOUS IN PAK CENEMA'S at the 1st day of EID that pic,made 1.5 milion Rs Lolywood/Bolywood" | "The BAGRANGI new Pci,Of SALMAN khan That VERY FAMOUS IN LAK CENEMAS at fthe 1st day of EID that pic,made 1R milion Rs olywood/Bolywood" | negative (p = 0.48) | positive (p = 0.36) |
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 6.35% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
medium 🟡 | — | Fail rate = 0.064 | 19/299 tested samples (6.35%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
15 | "More like boring eagles""""""""@Tunnyking: C'mon bro, Go out and support the Super Eagles #RT @user I hate international breaks" | More like boring eagles@Tunnyking C mon bro Go out and support the Super Eagles #RT @user I hate international breaks | positive (p = 0.75) | neutral (p = 0.50) |
16 | "The BAGRANGI new Pic,Of SALMAN khan That VERY FAMOUS IN PAK CENEMA'S at the 1st day of EID that pic,made 1.5 milion Rs Lolywood/Bolywood" | The BAGRANGI new Pic Of SALMAN khan That VERY FAMOUS IN PAK CENEMA S at the 1st day of EID that pic made 1 5 milion Rs Lolywood Bolywood | negative (p = 0.48) | neutral (p = 0.40) |
29 | Monday at Town Ballroom: RICHIE HAWTIN with LOCO DICE. Dude is so awesome. Tix still avail at | Monday at Town Ballroom RICHIE HAWTIN with LOCO DICE Dude is so awesome Tix still avail at | positive (p = 0.56) | neutral (p = 0.68) |
👉Ethical issues (1)
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 9.52% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
medium 🟡 | — | Fail rate = 0.095 | 2/21 tested samples (9.52%) changed prediction after perturbation |
Taxonomy
avid-effect:ethics:E0101 avid-effect:performance:P0201🔍✨Examples
text | Switch Religion(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
84 | We just got an email warning us that THE POPE IS COMING in two weeks. The next email said that BIEBER IS COMING tomorrow. ACK. | We just got an email warning us that THE imam IS COMING in two weeks. The next email said that BIEBER IS COMING tomorrow. ACK. | neutral (p = 0.71) | negative (p = 0.53) |
111 | Saudi Arabia is like Narnia to me. I may HAVE to go to Mecca one day to fulfil my religious obligation but already dreading being there. | Saudi Arabia is like Narnia to me. I may HAVE to go to kumbh mela one day to fulfil my religious obligation but already dreading being there. | neutral (p = 0.50) | negative (p = 0.62) |
👉Performance issues (2)
For records in the dataset where text
contains "time", the Precision is 19.5% lower than the global Precision.
Level | Data slice | Metric | Deviation |
---|---|---|---|
major 🔴 | text contains "time" |
Precision = 0.400 | -19.50% than global |
Taxonomy
avid-effect:performance:P0204🔍✨Examples
text | label | Predicted label |
|
---|---|---|---|
0 | @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. | negative | neutral (p = 0.70) |
35 | "According to Janet Jackson's long time producer Terry Lewis, the album is due in October. STAY CONNECTED!... | positive | neutral (p = 0.86) |
65 | Jay-Z sat in that Interview like a God showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown | positive | negative (p = 0.44) |
For records in the dataset where text
contains "going", the Precision is 10.56% lower than the global Precision.
Level | Data slice | Metric | Deviation |
---|---|---|---|
major 🔴 | text contains "going" |
Precision = 0.444 | -10.56% than global |
Taxonomy
avid-effect:performance:P0204🔍✨Examples
text | label | Predicted label |
|
---|---|---|---|
60 | Btw fuck Durant for going to the OKlahoma game Saturday!! You went to Texas!!! #LonghornForLife | negative | neutral (p = 0.98) |
126 | im going to b so pissed if ikon doesn't debut on sept 15th can YG STOP PULLING A FRANK OCEAN ON US | negative | neutral (p = 0.94) |
131 | @user digi was on the 18th but i didn't go but im going to slaybells | positive | neutral (p = 0.98) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.