ModernBERT DGA Detector

This model is designed to classify domains as either legitimate or generated by Domain Generation Algorithms (DGA).

Model Description

  • Model Type: BERT-based sequence classification
  • Task: Binary classification (Legitimate vs DGA domains)
  • Base Model: ModernBERT-base
  • Training Data: Domain names dataset
  • Author: Reynier Leyva La O, Carlos A. Catania

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Reynier/modernbert-dga-detector")
model = AutoModelForSequenceClassification.from_pretrained("Reynier/modernbert-dga-detector")

# Example prediction
def predict_domain(domain):
    inputs = tokenizer(domain, return_tensors="pt", max_length=64, truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.softmax(outputs.logits, dim=-1)
        legit_prob = predictions[0][0].item()
        dga_prob = predictions[0][1].item()
    return {"prediction": "DGA" if dga_prob > legit_prob else "LEGITIMATE", 
             "confidence": max(legit_prob, dga_prob)}

# Test examples
domains = ["google.com", "xkvbzpqr.net", "facebook.com", "abcdef123456.com"]
for domain in domains:
    result = predict_domain(domain)
    print(f"{domain} -> {result['prediction']} (confidence: {result['confidence']:.3f})")

Model Architecture

The model is based on ModernBERT and fine-tuned for domain classification:

  • Input: Domain names (text)
  • Output: Binary classification (0=LEGITIMATE, 1=DGA)
  • Max sequence length: 64 tokens

Training Details

This model was fine-tuned on a dataset of legitimate and DGA-generated domains using:

  • Base model: answerdotai/ModernBERT-base
  • Framework: Transformers/PyTorch
  • Task: Binary sequence classification

Performance

Add your model's performance metrics here when available:

  • Accuracy: 0.9658 ± 0.0153
  • Precision: 0.9704 ± 0.0253
  • Recall: 0.9582 ± 0.0147
  • F1-Score: 0.9579 ± 0.0167
  • FPR: 0.0267 ± 0.0233
  • TPR: 0.9582 ± 0.0147
  • Query Time 0.1226 ± 0.0253 in CPU do not need GPU

Use Cases

  • Cybersecurity: Detect malicious domains generated by malware
  • Network Security: Filter potentially harmful domains
  • Threat Intelligence: Analyze domain patterns in security feeds

Limitations

  • This model is trained specifically for domain classification
  • Performance may vary on domains from different TLDs or languages
  • Regular retraining may be needed as DGA techniques evolve
  • Model performance depends on the quality and diversity of training data

Citation

If you use this model in your research or applications, please cite it appropriately.

Related Models

Check out the author's other security models:

Downloads last month
14
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Reynier/modernbert-dga-detector

Finetuned
(732)
this model