Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: tensorflow
|
3 |
+
tags:
|
4 |
+
- sentiment-analysis
|
5 |
+
- aspect-based-sentiment-analysis
|
6 |
+
- tensorflow
|
7 |
+
- keras
|
8 |
+
language:
|
9 |
+
- tr
|
10 |
+
metrics:
|
11 |
+
- accuracy
|
12 |
+
pipeline_tag: text-classification
|
13 |
+
datasets:
|
14 |
+
- Sengil/Turkish-ABSA-Wsynthetic
|
15 |
+
---
|
16 |
+
|
17 |
+
|
18 |
+
# ๐น๐ท Turkish Aspect-Based Sentiment Analysis (ABSA) โ BiLSTM + Word2Vec
|
19 |
+
|
20 |
+
This model performs aspect-based sentiment analysis (ABSA) on Turkish sentences. Given a sentence and a specific aspect, it predicts the sentiment polarity (Negative, Neutral, Positive) associated with that aspect.
|
21 |
+
|
22 |
+
## ๐ง Model Details
|
23 |
+
|
24 |
+
- **Model Type:** BiLSTM (Bidirectional Long Short-Term Memory) + Word2Vec
|
25 |
+
- **Developer:** [Sengil](https://huggingface.co/Sengil)
|
26 |
+
- **Library:** Keras
|
27 |
+
- **Input Format:** `"Sentence [ASP] Aspect"`
|
28 |
+
- **Labels:** 0 = Negative, 1 = Neutral, 2 = Positive
|
29 |
+
- **Training Dataset:** [Sengil/Turkish-ABSA-Wsynthetic](https://huggingface.co/datasets/Sengil/Turkish-ABSA-Wsynthetic)
|
30 |
+
|
31 |
+
## ๐ Evaluation Results
|
32 |
+
|
33 |
+
The model achieved the following performance on the test set:
|
34 |
+
|
35 |
+
| Class | Precision | Recall | F1-Score | Support |
|
36 |
+
|----------|-----------|--------|----------|---------|
|
37 |
+
| Negative | 0.89 | 0.91 | 0.90 | 896 |
|
38 |
+
| Neutral | 0.70 | 0.64 | 0.67 | 140 |
|
39 |
+
| Positive | 0.92 | 0.92 | 0.92 | 1178 |
|
40 |
+
| **Overall** | | | **0.90** | 2214 |
|
41 |
+
|
42 |
+
- **Overall Accuracy:** 90%
|
43 |
+
- **Macro-Averaged F1-Score:** 83%
|
44 |
+
- **Weighted-Averaged F1-Score:** 90%
|
45 |
+
|
46 |
+
## ๐ Usage Example
|
47 |
+
|
48 |
+
```python
|
49 |
+
import pickle
|
50 |
+
import numpy as np
|
51 |
+
from tensorflow.keras.models import load_model
|
52 |
+
from tensorflow.keras.preprocessing.sequence import pad_sequences
|
53 |
+
|
54 |
+
# Load the model and tokenizer
|
55 |
+
model = load_model("absa_bilstm_model.keras")
|
56 |
+
with open("tokenizer.pkl", "rb") as f:
|
57 |
+
tokenizer = pickle.load(f)
|
58 |
+
|
59 |
+
# Maximum sentence length used during training
|
60 |
+
max_len = 84 # Adjust this value based on your training configuration
|
61 |
+
|
62 |
+
# Prediction function
|
63 |
+
def predict_sentiment(sentence, aspect):
|
64 |
+
input_text = f"{sentence} [ASP] {aspect}"
|
65 |
+
sequence = tokenizer.texts_to_sequences([input_text])
|
66 |
+
padded_sequence = pad_sequences(sequence, maxlen=max_len, padding='post')
|
67 |
+
prediction = model.predict(padded_sequence)
|
68 |
+
label = np.argmax(prediction, axis=1)[0]
|
69 |
+
labels = {0: "Negative", 1: "Neutral", 2: "Positive"}
|
70 |
+
return labels[label]
|
71 |
+
|
72 |
+
# Example usage
|
73 |
+
sentence = "Manzara ลahane evet ama servis rezalet."
|
74 |
+
aspect = "Servis"
|
75 |
+
print(f"Sentiment for '{aspect}': {predict_sentiment(sentence, aspect)}")
|
76 |
+
````
|
77 |
+
|
78 |
+
## ๐๏ธโโ๏ธ Training Details
|
79 |
+
|
80 |
+
* **Embedding:** Word2Vec (dimension: 100)
|
81 |
+
* **Model Architecture:**
|
82 |
+
|
83 |
+
* Embedding layer (initialized with pre-trained Word2Vec weights)
|
84 |
+
* 2 x BiLSTM layers (each with 100 units, dropout: 0.3)
|
85 |
+
* Conv1D layer (100 filters, kernel size: 5)
|
86 |
+
* Global Max Pooling
|
87 |
+
* Dense layer (16 units, ReLU activation)
|
88 |
+
* Output layer (3 units, softmax activation)
|
89 |
+
* **Training Parameters:**
|
90 |
+
|
91 |
+
* Loss Function: `sparse_categorical_crossentropy`
|
92 |
+
* Optimizer: Adam
|
93 |
+
* Epochs: 35 (with early stopping)
|
94 |
+
* Batch Size: 128
|
95 |
+
* Learning Rate: 1e-3 (adjusted dynamically with ReduceLROnPlateau)
|
96 |
+
|
97 |
+
## ๐ Training Data
|
98 |
+
|
99 |
+
The model was trained on the [Sengil/Turkish-ABSA-Wsynthetic](https://huggingface.co/datasets/Sengil/Turkish-ABSA-Wsynthetic) dataset, which comprises semi-synthetic Turkish sentences annotated for aspect-based sentiment analysis, particularly in the restaurant domain.
|
100 |
+
|
101 |
+
## โ ๏ธ Limitations
|
102 |
+
|
103 |
+
* Performance on the Neutral class is lower compared to other classes, possibly due to class imbalance in the training data.
|
104 |
+
* The model may struggle with rare or ambiguous aspects not well represented in the training set.
|
105 |
+
* Complex sentence structures or ironic expressions may affect the model's accuracy.
|
106 |
+
|
107 |
+
## ๐ Citation
|
108 |
+
|
109 |
+
```
|
110 |
+
@misc{turkish_absa_bilstm_word2vec,
|
111 |
+
title = {Turkish Aspect-Based Sentiment Analysis using BiLSTM + Word2Vec},
|
112 |
+
author = {Sengil},
|
113 |
+
year = {2025},
|
114 |
+
url = {https://huggingface.co/Sengil/Turkish-ABSA-BiLSTM-Word2Vec}
|
115 |
+
}
|
116 |
+
```
|
117 |
+
|
118 |
+
## ๐ฌ Contact
|
119 |
+
|
120 |
+
For questions or feedback, please reach out via [Hugging Face profile](https://huggingface.co/Sengil).
|
121 |
+
|
122 |
+
```
|
123 |
+
|
124 |
+
---
|
125 |
+
|
126 |
+
You can save this content as a `README.md` file and include it in your Hugging Face model repository. Ensure that you also upload the `absa_bilstm_model.keras` and `tokenizer.pkl` files to the repository. For guidance on uploading models to Hugging Face, refer to their [Model Sharing Documentation](https://huggingface.co/docs/hub/models-sharing).
|
127 |
+
::contentReference[oaicite:0]{index=0}
|
128 |
+
|
129 |
+
```
|