Added more performance metrics to README
Browse files
README.md
CHANGED
@@ -82,7 +82,7 @@ The code used to train the model can be found on GitHub:
|
|
82 |
|
83 |
The research paper can be found here: [ELECTRA and GPT-4o: Cost-Effective Partners for Sentiment Analysis](https://github.com/jbeno/sentiment/research_paper.pdf)
|
84 |
|
85 |
-
### Performance
|
86 |
|
87 |
- **Merged Dataset**
|
88 |
- Macro Average F1: **79.29**
|
@@ -97,7 +97,6 @@ The research paper can be found here: [ELECTRA and GPT-4o: Cost-Effective Partne
|
|
97 |
- Macro Average F1: **69.95**
|
98 |
- Accuracy: **78.24**
|
99 |
|
100 |
-
|
101 |
## Model Architecture
|
102 |
|
103 |
- **Base Model**: ELECTRA base discriminator (`google/electra-base-discriminator`)
|
@@ -255,6 +254,112 @@ The model's configuration (config.json) includes custom parameters:
|
|
255 |
- `dropout_rate`: Dropout rate used in the classifier.
|
256 |
- `pooling`: Pooling strategy used ('mean').
|
257 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
258 |
## License
|
259 |
|
260 |
This model is licensed under the MIT License.
|
|
|
82 |
|
83 |
The research paper can be found here: [ELECTRA and GPT-4o: Cost-Effective Partners for Sentiment Analysis](https://github.com/jbeno/sentiment/research_paper.pdf)
|
84 |
|
85 |
+
### Performance Summary
|
86 |
|
87 |
- **Merged Dataset**
|
88 |
- Macro Average F1: **79.29**
|
|
|
97 |
- Macro Average F1: **69.95**
|
98 |
- Accuracy: **78.24**
|
99 |
|
|
|
100 |
## Model Architecture
|
101 |
|
102 |
- **Base Model**: ELECTRA base discriminator (`google/electra-base-discriminator`)
|
|
|
254 |
- `dropout_rate`: Dropout rate used in the classifier.
|
255 |
- `pooling`: Pooling strategy used ('mean').
|
256 |
|
257 |
+
## Performance by Dataset
|
258 |
+
|
259 |
+
### Merged Dataset
|
260 |
+
|
261 |
+
```
|
262 |
+
Merged Dataset Classification Report
|
263 |
+
|
264 |
+
precision recall f1-score support
|
265 |
+
|
266 |
+
negative 0.847081 0.777211 0.810643 2352
|
267 |
+
neutral 0.704453 0.761072 0.731669 1829
|
268 |
+
positive 0.828047 0.844615 0.836249 2349
|
269 |
+
|
270 |
+
accuracy 0.796937 6530
|
271 |
+
macro avg 0.793194 0.794299 0.792854 6530
|
272 |
+
weighted avg 0.800285 0.796937 0.797734 6530
|
273 |
+
|
274 |
+
ROC AUC: 0.926344
|
275 |
+
|
276 |
+
Predicted negative neutral positive
|
277 |
+
Actual
|
278 |
+
negative 1828 331 193
|
279 |
+
neutral 218 1392 219
|
280 |
+
positive 112 253 1984
|
281 |
+
|
282 |
+
Macro F1 Score: 0.79
|
283 |
+
```
|
284 |
+
|
285 |
+
### DynaSent Round 1
|
286 |
+
|
287 |
+
```
|
288 |
+
DynaSent Round 1 Classification Report
|
289 |
+
|
290 |
+
precision recall f1-score support
|
291 |
+
|
292 |
+
negative 0.901222 0.737500 0.811182 1200
|
293 |
+
neutral 0.745957 0.922500 0.824888 1200
|
294 |
+
positive 0.850970 0.804167 0.826907 1200
|
295 |
+
|
296 |
+
accuracy 0.821389 3600
|
297 |
+
macro avg 0.832716 0.821389 0.820992 3600
|
298 |
+
weighted avg 0.832716 0.821389 0.820992 3600
|
299 |
+
|
300 |
+
ROC AUC: 0.945131
|
301 |
+
|
302 |
+
Predicted negative neutral positive
|
303 |
+
Actual
|
304 |
+
negative 885 201 114
|
305 |
+
neutral 38 1107 55
|
306 |
+
positive 59 176 965
|
307 |
+
|
308 |
+
Macro F1 Score: 0.82
|
309 |
+
```
|
310 |
+
|
311 |
+
### DynaSent Round 2
|
312 |
+
|
313 |
+
```
|
314 |
+
DynaSent Round 2 Classification Report
|
315 |
+
|
316 |
+
precision recall f1-score support
|
317 |
+
|
318 |
+
negative 0.696154 0.754167 0.724000 240
|
319 |
+
neutral 0.770408 0.629167 0.692661 240
|
320 |
+
positive 0.704545 0.775000 0.738095 240
|
321 |
+
|
322 |
+
accuracy 0.719444 720
|
323 |
+
macro avg 0.723702 0.719444 0.718252 720
|
324 |
+
weighted avg 0.723702 0.719444 0.718252 720
|
325 |
+
|
326 |
+
ROC AUC: 0.88842
|
327 |
+
|
328 |
+
Predicted negative neutral positive
|
329 |
+
Actual
|
330 |
+
negative 181 26 33
|
331 |
+
neutral 44 151 45
|
332 |
+
positive 35 19 186
|
333 |
+
|
334 |
+
Macro F1 Score: 0.72
|
335 |
+
```
|
336 |
+
|
337 |
+
### Stanford Sentiment Treebank (SST-3)
|
338 |
+
|
339 |
+
```
|
340 |
+
SST-3 Classification Report
|
341 |
+
|
342 |
+
precision recall f1-score support
|
343 |
+
|
344 |
+
negative 0.831878 0.835526 0.833698 912
|
345 |
+
neutral 0.452703 0.344473 0.391241 389
|
346 |
+
positive 0.834669 0.916392 0.873623 909
|
347 |
+
|
348 |
+
accuracy 0.782353 2210
|
349 |
+
macro avg 0.706417 0.698797 0.699521 2210
|
350 |
+
weighted avg 0.766284 0.782353 0.772239 2210
|
351 |
+
|
352 |
+
ROC AUC: 0.885009
|
353 |
+
|
354 |
+
Predicted negative neutral positive
|
355 |
+
Actual
|
356 |
+
negative 762 104 46
|
357 |
+
neutral 136 134 119
|
358 |
+
positive 18 58 833
|
359 |
+
|
360 |
+
Macro F1 Score: 0.70
|
361 |
+
```
|
362 |
+
|
363 |
## License
|
364 |
|
365 |
This model is licensed under the MIT License.
|