Sengil/t5-turkish-aspect-term-extractor 🇹🇷

A Turkish sequence-to-sequence model based on Turkish-NLP/t5-efficient-base-turkish, fine-tuned for Aspect Term Extraction (ATE) from customer reviews and sentences.

Given a Turkish sentence, the model generates a list of aspect terms (e.g., kahve, servis, fiyatlar) that reflect the primary discussed entities or features.


✨ Example

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import re
from collections import Counter

#LOAD MODEL
MODEL_ID = "Sengil/t5-turkish-aspect-term-extractor"
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_ID).to(DEVICE)
model.eval()

TURKISH_STOPWORDS = {
    "ve", "çok", "ama", "bir", "bu", "daha", "gibi", "ile", "için",
    "de", "da", "ki", "o", "şu", "bu", "sen", "biz", "siz", "onlar"
}

def is_valid_aspect(word):
    word = word.strip().lower()
    return (
        len(word) > 1 and
        word not in TURKISH_STOPWORDS and
        word.isalpha()
    )

def extract_and_rank_aspects(text, max_tokens=64, beams=5):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(DEVICE)

    with torch.no_grad():
        outputs = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_new_tokens=max_tokens,
            num_beams=beams,
            num_return_sequences=beams,
            early_stopping=True
        )

    all_predictions = [
        tokenizer.decode(output, skip_special_tokens=True)
        for output in outputs
    ]


    all_terms = []
    for pred in all_predictions:
        candidates = re.split(r"[;,–—\-]|(?:\s*,\s*)", pred)
        all_terms.extend([w.strip().lower() for w in candidates if is_valid_aspect(w)])

    ranked = Counter(all_terms).most_common()
    return ranked


#INFERENCE
text = "Artılar: Göl manzarasıyla harika bir atmosfer, Ipoh'un her zaman sıcak olan havası nedeniyle iyi bir klima olan restoran, iyi ve hızlı hizmet sunan garsonlar, temassız ödeme kabul eden e-cüzdan, ücretsiz otopark ama sıcak güneş altında açık, yemeklerin tadı güzel."
ranked_aspects = extract_and_rank_aspects(text)

print("Sorted Aspect Terms:")
for term, score in ranked_aspects:
    print(f"{term:<15}  skor: {score}")

Output:

Sorted Aspect Terms:
atmosfer         skor: 1
servis           skor: 1
restoran         skor: 1
hizmet           skor: 1

📌 Model Details

Detail Value
Model Type AutoModelForSeq2SeqLM (T5-style)
Base Model Turkish-NLP/t5-efficient-base-turkish
Languages tr (Turkish)
Fine-tuning Task Aspect Term Extraction (sequence generation)
Framework 🤗 Transformers
License Apache-2.0
Tokenizer SentencePiece (T5-style)

📊 Dataset & Training

  • Total samples: 37,000+ Turkish review sentences
  • Input: Raw sentence (e.g., "Pilav çok lezzetliydi ama servis yavaştı.")
  • Target: Comma-separated aspect terms (e.g., "pilav, servis")

Training Configuration

Setting Value
Epochs 3
Batch size 8
Max input length 128 tokens
Max output length 64 tokens
Optimizer AdamW
Learning rate 3e-5
Scheduler Linear
Precision FP32
Hardware 1× Tesla T4 / P100

🔍 Evaluation

The model was evaluated using exact-match micro-F1 score on a held-out test set.

Metric Score
Micro-F1 0.84+
Exact Match ~78%

💡 Use Cases

  • 💬 Opinion mining in Turkish product or service reviews
  • 🧾 Aspect-level sentiment analysis preprocessing
  • 📊 Feature-based review summarization in NLP pipelines

📦 Model Card / Citation

@misc{Sengil2025T5AspectTR,
  title   = {Sengil/t5-turkish-aspect-term-extractor: Turkish Aspect Term Extraction with T5},
  author  = {Şengil, Mert},
  year    = {2025},
  url     = {https://huggingface.co/Sengil/t5-turkish-aspect-term-extractor}
}

For contributions, improvements, or issue reporting, feel free to open a GitHub/Hugging Face issue or contact Mert Şengil.

Downloads last month
67
Safetensors
Model size
619M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sengil/t5-turkish-aspect-term-extractor

Finetuned
(2)
this model