File size: 1,650 Bytes
8091117 01478d2 8091117 01478d2 8091117 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
---
pipeline_tag: text-classification
language: multilingual
license: apache-2.0
tags:
- "sentiment-analysis"
- "multilingual"
widget:
- text: "I am very happy."
example_title: "English"
- text: "Heute bin ich schlecht drauf."
example_title: "Deutsch"
- text: "Quel cauchemard!"
example_title: "Francais"
- text: "ฉันรักฤดูใบไม้ผลิ"
example_title: "ภาษาไทย"
---
# Multi-lingual sentiment prediction trained from COVID19-related tweets
Repository: [https://github.com/clampert/multilingual-sentiment-analysis/](https://github.com/clampert/multilingual-sentiment-analysis/)
Model trained on a large-scale (18437530 examples) dataset of
multi-lingual tweets that was collected between March 2020
and November 2021 using Twitter’s Streaming API with varying
COVID19-related keywords. Labels were auto-general based on
the presence of positive and negative emoticons. For details
on the dataset, see our IEEE BigData 2021 publication.
Base model is [sentence-transformers/stsb-xlm-r-multilingual](https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual).
It was finetuned for sequence classification with `positive`
and `negative` labels for two epochs (48 hours on 8xP100 GPUs).
## Citation
If you use our model your work, please cite:
```
@inproceedings{lampert2021overcoming,
title={Overcoming Rare-Language Discrimination in Multi-Lingual Sentiment Analysis},
author={Jasmin Lampert and Christoph H. Lampert},
booktitle={IEEE International Conference on Big Data (BigData)},
year={2021},
note={Special Session: Machine Learning on Big Data},
}
```
Enjoy!
|