Sengil/t5-turkish-aspect-term-extractor 🇹🇷
A Turkish sequence-to-sequence model based on Turkish-NLP/t5-efficient-base-turkish
, fine-tuned for Aspect Term Extraction (ATE) from customer reviews and sentences.
Given a Turkish sentence, the model generates a list of aspect terms (e.g., kahve, servis, fiyatlar) that reflect the primary discussed entities or features.
✨ Example
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import re
from collections import Counter
#LOAD MODEL
MODEL_ID = "Sengil/t5-turkish-aspect-term-extractor"
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_ID).to(DEVICE)
model.eval()
TURKISH_STOPWORDS = {
"ve", "çok", "ama", "bir", "bu", "daha", "gibi", "ile", "için",
"de", "da", "ki", "o", "şu", "bu", "sen", "biz", "siz", "onlar"
}
def is_valid_aspect(word):
word = word.strip().lower()
return (
len(word) > 1 and
word not in TURKISH_STOPWORDS and
word.isalpha()
)
def extract_and_rank_aspects(text, max_tokens=64, beams=5):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(DEVICE)
with torch.no_grad():
outputs = model.generate(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
max_new_tokens=max_tokens,
num_beams=beams,
num_return_sequences=beams,
early_stopping=True
)
all_predictions = [
tokenizer.decode(output, skip_special_tokens=True)
for output in outputs
]
all_terms = []
for pred in all_predictions:
candidates = re.split(r"[;,–—\-]|(?:\s*,\s*)", pred)
all_terms.extend([w.strip().lower() for w in candidates if is_valid_aspect(w)])
ranked = Counter(all_terms).most_common()
return ranked
#INFERENCE
text = "Artılar: Göl manzarasıyla harika bir atmosfer, Ipoh'un her zaman sıcak olan havası nedeniyle iyi bir klima olan restoran, iyi ve hızlı hizmet sunan garsonlar, temassız ödeme kabul eden e-cüzdan, ücretsiz otopark ama sıcak güneş altında açık, yemeklerin tadı güzel."
ranked_aspects = extract_and_rank_aspects(text)
print("Sorted Aspect Terms:")
for term, score in ranked_aspects:
print(f"{term:<15} skor: {score}")
Output:
Sorted Aspect Terms:
atmosfer skor: 1
servis skor: 1
restoran skor: 1
hizmet skor: 1
📌 Model Details
Detail | Value |
---|---|
Model Type | AutoModelForSeq2SeqLM (T5-style) |
Base Model | Turkish-NLP/t5-efficient-base-turkish |
Languages | tr (Turkish) |
Fine-tuning Task | Aspect Term Extraction (sequence generation) |
Framework | 🤗 Transformers |
License | Apache-2.0 |
Tokenizer | SentencePiece (T5-style) |
📊 Dataset & Training
- Total samples: 37,000+ Turkish review sentences
- Input: Raw sentence (e.g.,
"Pilav çok lezzetliydi ama servis yavaştı."
) - Target: Comma-separated aspect terms (e.g.,
"pilav, servis"
)
Training Configuration
Setting | Value |
---|---|
Epochs | 3 |
Batch size | 8 |
Max input length | 128 tokens |
Max output length | 64 tokens |
Optimizer | AdamW |
Learning rate | 3e-5 |
Scheduler | Linear |
Precision | FP32 |
Hardware | 1× Tesla T4 / P100 |
🔍 Evaluation
The model was evaluated using exact-match micro-F1 score on a held-out test set.
Metric | Score |
---|---|
Micro-F1 | 0.84+ |
Exact Match | ~78% |
💡 Use Cases
- 💬 Opinion mining in Turkish product or service reviews
- 🧾 Aspect-level sentiment analysis preprocessing
- 📊 Feature-based review summarization in NLP pipelines
📦 Model Card / Citation
@misc{Sengil2025T5AspectTR,
title = {Sengil/t5-turkish-aspect-term-extractor: Turkish Aspect Term Extraction with T5},
author = {Şengil, Mert},
year = {2025},
url = {https://huggingface.co/Sengil/t5-turkish-aspect-term-extractor}
}
For contributions, improvements, or issue reporting, feel free to open a GitHub/Hugging Face issue or contact Mert Şengil.
- Downloads last month
- 67
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Sengil/t5-turkish-aspect-term-extractor
Base model
Turkish-NLP/t5-efficient-base-turkish