vukosi commited on
Commit
bb1e53c
·
1 Parent(s): c0b773a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -3
README.md CHANGED
@@ -49,6 +49,19 @@ tokenizer = RobertaTokenizer.from_pretrained('dsfsi/PuoBERTa')
49
 
50
  ## Downstream Performance
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ### MasakhaPOS
53
 
54
  Performance of models on the MasakhaPOS downstream task.
@@ -65,6 +78,8 @@ Performance of models on the MasakhaPOS downstream task.
65
  | PuoBERTa | **83.4** |
66
  | PuoBERTa+JW300 | 84.1 |
67
 
 
 
68
  ### MasakhaNER
69
 
70
  Performance of models on the MasakhaNER downstream task.
@@ -80,13 +95,17 @@ Performance of models on the MasakhaNER downstream task.
80
  | PuoBERTa | **78.2** |
81
  | PuoBERTa+JW300 | 80.2 |
82
 
83
- ## Dataset
 
 
 
 
84
 
85
- We used the PuoData dataset, a rich source of Setswana text, ensuring that our model is well-trained and culturally attuned.\\
86
 
87
  ## Citation Information
88
 
89
- Bibtex Refrence
90
 
91
  ```
92
  @inproceedings{marivate2023puoberta,
 
49
 
50
  ## Downstream Performance
51
 
52
+ ### Daily News Dikgang
53
+
54
+ Learn more about the dataset in the [Dataset Folder](daily-news-dikgang)
55
+
56
+ | **Model** | **5-fold Cross Validation F1** | **Test F1** |
57
+ |-----------------------------|--------------------------------------|-------------------|
58
+ | Logistic Regression + TFIDF | 60.1 | 56.2 |
59
+ | NCHLT TSN RoBERTa | 64.7 | 60.3 |
60
+ | PuoBERTa | **63.8** | **62.9** |
61
+ | PuoBERTaJW300 | 66.2 | 65.4 |
62
+
63
+ Downstream News Categorisation model 🤗 [https://huggingface.co/dsfsi/PuoBERTa-News](https://huggingface.co/dsfsi/PuoBERTa-News)
64
+
65
  ### MasakhaPOS
66
 
67
  Performance of models on the MasakhaPOS downstream task.
 
78
  | PuoBERTa | **83.4** |
79
  | PuoBERTa+JW300 | 84.1 |
80
 
81
+ Downstream POS model 🤗 [https://huggingface.co/dsfsi/PuoBERTa-POS](https://huggingface.co/dsfsi/PuoBERTa-POS)
82
+
83
  ### MasakhaNER
84
 
85
  Performance of models on the MasakhaNER downstream task.
 
95
  | PuoBERTa | **78.2** |
96
  | PuoBERTa+JW300 | 80.2 |
97
 
98
+ Downstream NER model 🤗 [https://huggingface.co/dsfsi/PuoBERTa-NER](https://huggingface.co/dsfsi/PuoBERTa-NER)
99
+
100
+ ## Pre-Training Dataset
101
+
102
+ We used the PuoData dataset, a rich source of Setswana text, ensuring that our model is well-trained and culturally attuned.
103
 
104
+ [Github](https://github.com/dsfsi/PuoData), 🤗 [https://huggingface.co/datasets/dsfsi/PuoData](https://huggingface.co/datasets/dsfsi/PuoData)
105
 
106
  ## Citation Information
107
 
108
+ Bibtex Reference
109
 
110
  ```
111
  @inproceedings{marivate2023puoberta,