Update README.md
Browse files
README.md
CHANGED
@@ -49,6 +49,19 @@ tokenizer = RobertaTokenizer.from_pretrained('dsfsi/PuoBERTa')
|
|
49 |
|
50 |
## Downstream Performance
|
51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
### MasakhaPOS
|
53 |
|
54 |
Performance of models on the MasakhaPOS downstream task.
|
@@ -65,6 +78,8 @@ Performance of models on the MasakhaPOS downstream task.
|
|
65 |
| PuoBERTa | **83.4** |
|
66 |
| PuoBERTa+JW300 | 84.1 |
|
67 |
|
|
|
|
|
68 |
### MasakhaNER
|
69 |
|
70 |
Performance of models on the MasakhaNER downstream task.
|
@@ -80,13 +95,17 @@ Performance of models on the MasakhaNER downstream task.
|
|
80 |
| PuoBERTa | **78.2** |
|
81 |
| PuoBERTa+JW300 | 80.2 |
|
82 |
|
83 |
-
|
|
|
|
|
|
|
|
|
84 |
|
85 |
-
|
86 |
|
87 |
## Citation Information
|
88 |
|
89 |
-
Bibtex
|
90 |
|
91 |
```
|
92 |
@inproceedings{marivate2023puoberta,
|
|
|
49 |
|
50 |
## Downstream Performance
|
51 |
|
52 |
+
### Daily News Dikgang
|
53 |
+
|
54 |
+
Learn more about the dataset in the [Dataset Folder](daily-news-dikgang)
|
55 |
+
|
56 |
+
| **Model** | **5-fold Cross Validation F1** | **Test F1** |
|
57 |
+
|-----------------------------|--------------------------------------|-------------------|
|
58 |
+
| Logistic Regression + TFIDF | 60.1 | 56.2 |
|
59 |
+
| NCHLT TSN RoBERTa | 64.7 | 60.3 |
|
60 |
+
| PuoBERTa | **63.8** | **62.9** |
|
61 |
+
| PuoBERTaJW300 | 66.2 | 65.4 |
|
62 |
+
|
63 |
+
Downstream News Categorisation model 🤗 [https://huggingface.co/dsfsi/PuoBERTa-News](https://huggingface.co/dsfsi/PuoBERTa-News)
|
64 |
+
|
65 |
### MasakhaPOS
|
66 |
|
67 |
Performance of models on the MasakhaPOS downstream task.
|
|
|
78 |
| PuoBERTa | **83.4** |
|
79 |
| PuoBERTa+JW300 | 84.1 |
|
80 |
|
81 |
+
Downstream POS model 🤗 [https://huggingface.co/dsfsi/PuoBERTa-POS](https://huggingface.co/dsfsi/PuoBERTa-POS)
|
82 |
+
|
83 |
### MasakhaNER
|
84 |
|
85 |
Performance of models on the MasakhaNER downstream task.
|
|
|
95 |
| PuoBERTa | **78.2** |
|
96 |
| PuoBERTa+JW300 | 80.2 |
|
97 |
|
98 |
+
Downstream NER model 🤗 [https://huggingface.co/dsfsi/PuoBERTa-NER](https://huggingface.co/dsfsi/PuoBERTa-NER)
|
99 |
+
|
100 |
+
## Pre-Training Dataset
|
101 |
+
|
102 |
+
We used the PuoData dataset, a rich source of Setswana text, ensuring that our model is well-trained and culturally attuned.
|
103 |
|
104 |
+
[Github](https://github.com/dsfsi/PuoData), 🤗 [https://huggingface.co/datasets/dsfsi/PuoData](https://huggingface.co/datasets/dsfsi/PuoData)
|
105 |
|
106 |
## Citation Information
|
107 |
|
108 |
+
Bibtex Reference
|
109 |
|
110 |
```
|
111 |
@inproceedings{marivate2023puoberta,
|