Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,67 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
license: apache-2.0
|
5 |
+
library_name: transformers
|
6 |
+
tags:
|
7 |
+
- ancient-greek
|
8 |
+
- koine-greek
|
9 |
+
- sentiment-analysis
|
10 |
+
- regression
|
11 |
+
- digital-humanities
|
12 |
+
base_model: pranaydeeps/Ancient-Greek-BERT
|
13 |
+
datasets:
|
14 |
+
- custom
|
15 |
+
---
|
16 |
+
|
17 |
+
# Ancient Greek Valence BERT
|
18 |
+
|
19 |
+
This is a `pranaydeeps/Ancient-Greek-BERT` model fine-tuned for valence regression on Ancient Greek texts. The model is designed not for classification, but to predict a continuous sentiment **valence score** ranging from **-1.0 (most negative) to +1.0 (most positive)**.
|
20 |
+
|
21 |
+
This model was developed as part of a Ph.D. dissertation at Yonsei University, focusing on a sentiment analysis of Pauline epistles.
|
22 |
+
|
23 |
+
## Model Description
|
24 |
+
|
25 |
+
The model is intended for academic use in Digital Humanities, Classics, and New Testament studies to analyze the sentiment polarity of texts in Koine and Homeric Greek. It takes a Greek sentence as input and returns a single regression score.
|
26 |
+
|
27 |
+
## Training and Evaluation
|
28 |
+
|
29 |
+
### Training Data
|
30 |
+
|
31 |
+
The model was trained on a custom-built corpus of 693 samples, which is a combination of two main sources:
|
32 |
+
1. **Homeric Greek Dataset**: The sentiment dataset from the *Iliad*, developed by Pavlopoulos et al. (2022).
|
33 |
+
2. **New Testament (Koine Greek) Dataset**: A new, bespoke corpus annotated by a panel of eight New Testament studies experts from Yonsei University. This smaller dataset was expanded using back-translation and generative augmentation techniques to balance the training pool.
|
34 |
+
|
35 |
+
All training data and scripts are available at the [GitHub repository](https://github.com/luvnpce83/koine-greek-sentiment-analysis).
|
36 |
+
|
37 |
+
### Training Procedure
|
38 |
+
|
39 |
+
The model was fine-tuned for a regression task using a Mean Squared Error (MSE) loss function. Key hyperparameters include a learning rate of 5e-5, a batch size of 32, and an AdamW optimizer. Training was performed with early stopping based on the Spearman correlation on a validation set.
|
40 |
+
|
41 |
+
### Evaluation Results
|
42 |
+
|
43 |
+
The model's performance was evaluated on two separate, unseen test sets. The results demonstrate strong and consistent generalization across both domains.
|
44 |
+
|
45 |
+
| Test Set | Pearson Correlation | Spearman Correlation |
|
46 |
+
|-----------------|---------------------|----------------------|
|
47 |
+
| New Testament | 0.643 | 0.629 |
|
48 |
+
| Homeric | 0.639 | 0.628 |
|
49 |
+
|
50 |
+
|
51 |
+
## How to Use
|
52 |
+
|
53 |
+
You can use this model with the `transformers` library pipeline for sentiment analysis. Since this is a regression model, the output will be a raw score, not a "POSITIVE" or "NEGATIVE" label.
|
54 |
+
|
55 |
+
```python
|
56 |
+
from transformers import pipeline
|
57 |
+
|
58 |
+
# Load the model from the Hub
|
59 |
+
valence_analyzer = pipeline("sentiment-analysis", model="luvnpce8d3/ancient-greek-valence-bert")
|
60 |
+
|
61 |
+
# Analyze an example sentence
|
62 |
+
text = "Μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος" # "Sing, Goddess, the wrath of Peleus' son Achilles"
|
63 |
+
result = valence_analyzer(text)
|
64 |
+
|
65 |
+
# The result is a score from -1.0 to 1.0
|
66 |
+
print(result)
|
67 |
+
# [{'label': 'LABEL_0', 'score': 0.375}] -> A positive valence score
|