NLP Indonesia Multitask
Collection
A collection of Indonesian NLP models for various text classification tasks such as spam detection, hate speech, abusive language, and more. Suitable
•
5 items
•
Updated
BERT model for spam detection in Indonesian with 95% accuracy. This v3 model has been fine-tuned from v2 model with email dataset for optimal performance on Indonesian content.
from transformers import pipeline
# The easiest way to use the model
classifier = pipeline("text-classification",
model="nahiar/spam-detection-bert-v3",
tokenizer="nahiar/spam-detection-bert-v3")
# Test with text
texts = [
"lacak hp hilang by no hp / imei lacak penipu/scammer/tabrak lari/terror/revengeporn sadap / hack / pulihkan akun",
"Senin, 21 Juli 2025, Samapta Polsek Ngaglik melaksanakan patroli stasioner balong jalan palagan donoharjo",
"Mari berkontribusi terhadap gerakan rakyat dengan membeli baju ini seharga Rp 160.000. Hubungi kami melalui WA 08977472296"
]
results = classifier(texts)
for text, result in zip(texts, results):
print(f"Text: {text}")
print(f"Result: {result['label']} (confidence: {result['score']:.4f})")
print("---")
Metric | HAM | SPAM | Overall |
---|---|---|---|
Precision | 98% | 77% | 95% |
Recall | 96% | 85% | 95% |
F1-Score | 97% | 81% | 95% |
Overall Accuracy | - | - | 95% |
0: "HAM" (not spam)
1: "SPAM" (spam)
This model was retrained using:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("nahiar/spam-detection-bert-v3")
model = AutoModelForSequenceClassification.from_pretrained("nahiar/spam-detection-bert-v3")
def predict_spam(text):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
predicted_label = torch.argmax(probs, dim=1).item()
confidence = probs[0][predicted_label].item()
label_map = {0: "HAM", 1: "SPAM"}
return label_map[predicted_label], confidence
# Test
text = "Dapatkan uang dengan mudah! Klik link ini sekarang!"
result, confidence = predict_spam(text)
print(f"Prediksi: {result} (Confidence: {confidence:.4f})")
@misc{nahiar_spam_detection_bert,
title={Indonesian Spam Detection BERT},
author={Raihan Hidayatullah Djunaedi},
year={2025},
url={https://huggingface.co/nahiar/spam-detection-bert-v3}
}
Unable to build the model tree, the base model loops to the model itself. Learn more.