ModernBERT DGA Detector
This model is designed to classify domains as either legitimate or generated by Domain Generation Algorithms (DGA).
Model Description
- Model Type: BERT-based sequence classification
- Task: Binary classification (Legitimate vs DGA domains)
- Base Model: ModernBERT-base
- Training Data: Domain names dataset
- Author: Reynier Leyva La O, Carlos A. Catania
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Reynier/modernbert-dga-detector")
model = AutoModelForSequenceClassification.from_pretrained("Reynier/modernbert-dga-detector")
# Example prediction
def predict_domain(domain):
inputs = tokenizer(domain, return_tensors="pt", max_length=64, truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)
legit_prob = predictions[0][0].item()
dga_prob = predictions[0][1].item()
return {"prediction": "DGA" if dga_prob > legit_prob else "LEGITIMATE",
"confidence": max(legit_prob, dga_prob)}
# Test examples
domains = ["google.com", "xkvbzpqr.net", "facebook.com", "abcdef123456.com"]
for domain in domains:
result = predict_domain(domain)
print(f"{domain} -> {result['prediction']} (confidence: {result['confidence']:.3f})")
Model Architecture
The model is based on ModernBERT and fine-tuned for domain classification:
- Input: Domain names (text)
- Output: Binary classification (0=LEGITIMATE, 1=DGA)
- Max sequence length: 64 tokens
Training Details
This model was fine-tuned on a dataset of legitimate and DGA-generated domains using:
- Base model: answerdotai/ModernBERT-base
- Framework: Transformers/PyTorch
- Task: Binary sequence classification
Performance
Add your model's performance metrics here when available:
- Accuracy: 0.9658 ± 0.0153
- Precision: 0.9704 ± 0.0253
- Recall: 0.9582 ± 0.0147
- F1-Score: 0.9579 ± 0.0167
- FPR: 0.0267 ± 0.0233
- TPR: 0.9582 ± 0.0147
- Query Time 0.1226 ± 0.0253 in CPU do not need GPU
Use Cases
- Cybersecurity: Detect malicious domains generated by malware
- Network Security: Filter potentially harmful domains
- Threat Intelligence: Analyze domain patterns in security feeds
Limitations
- This model is trained specifically for domain classification
- Performance may vary on domains from different TLDs or languages
- Regular retraining may be needed as DGA techniques evolve
- Model performance depends on the quality and diversity of training data
Citation
If you use this model in your research or applications, please cite it appropriately.
Related Models
Check out the author's other security models:
- Downloads last month
- 14
Model tree for Reynier/modernbert-dga-detector
Base model
answerdotai/ModernBERT-base