Model card
Browse files
README.md
CHANGED
@@ -1 +1,39 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
pipeline_tag: sentiment-analysis
|
3 |
+
language: multilingual
|
4 |
+
license: apache-2.0
|
5 |
+
tags:
|
6 |
+
- "sentiment-analysis"
|
7 |
+
- "multilingual"
|
8 |
+
---
|
9 |
+
|
10 |
+
# Multi-lingual sentiment prediction trained from COVID19-related tweets
|
11 |
+
|
12 |
+
Repository: [https://github.com/clampert/multilingual-sentiment-analysis/](https://github.com/clampert/multilingual-sentiment-analysis/)
|
13 |
+
|
14 |
+
Model trained on a large-scale (18437530 examples) dataset of
|
15 |
+
multi-lingual tweets that was collected between March 2020
|
16 |
+
and November 2021 using Twitter’s Streaming API with varying
|
17 |
+
COVID19-related keywords. Labels were auto-general based on
|
18 |
+
the presence of positive and negative emoticons. For details
|
19 |
+
on the dataset, see our IEEE BigData 2021 publication.
|
20 |
+
|
21 |
+
Base model is [sentence-transformers/stsb-xlm-r-multilingual](https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual).
|
22 |
+
It was finetuned for sequence classification with `positive`
|
23 |
+
and `negative` labels for two epochs (48 hours on 8xP100 GPUs).
|
24 |
+
|
25 |
+
## Citation
|
26 |
+
|
27 |
+
If you use our model your work, please cite:
|
28 |
+
|
29 |
+
```
|
30 |
+
@inproceedings{lampert2021overcoming,
|
31 |
+
title={Overcoming Rare-Language Discrimination in Multi-Lingual Sentiment Analysis},
|
32 |
+
author={Jasmin Lampert and Christoph H. Lampert},
|
33 |
+
booktitle={IEEE International Conference on Big Data (BigData)},
|
34 |
+
year={2021},
|
35 |
+
note={Special Session: Machine Learning on Big Data},
|
36 |
+
}
|
37 |
+
```
|
38 |
+
|
39 |
+
Enjoy!
|