Model Card: AI-in-the-Loop for Real-Time Scam Detection & Scam-Baiting
This repository contains instruction-tuned large language models (LLMs) designed for real-time scam detection, conversational scam-baiting, and privacy-preserving federated learning.
The models are trained and evaluated as part of the paper:
AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning
Model Details
- Developed by: Supreme Lab, University of Texas at El Paso & Southern Illinois University Carbondale
- Funded by: U.S. National Science Foundation (Award No. 2451946) and U.S. Nuclear Regulatory Commission (Award No. 31310025M0012)
- Shared by: Ismail Hossain, Sai Puppala, Sajedul Talukder, Md Jahangir Alam
- Model type: Multi-task instruction-tuned LLMs (classification + safe text generation)
- Languages: English
- License: MIT
- Finetuned from: LlamaGuard family & MD-Judge
Model Sources
- Repository: GitHub – supreme-lab/ai-in-the-loop
- Hugging Face: supreme-lab/ai-in-the-loop
- Paper: ArXiv version
Uses
Direct Use
- Real-time scam classification (scam vs. non-scam conversations)
- Conversational scam-baiting to waste scammer time safely
- PII risk scoring to filter unsafe outputs
Downstream Use
- Integration into messaging platforms for scam prevention
- Benchmarks for AI safety alignment in adversarial contexts
- Research in federated privacy-preserving LLMs
Out-of-Scope Use
- Should not be used as a replacement for law enforcement tools
- Should not be deployed without safety filters and human-in-the-loop monitoring
- Not intended for financial or medical decision-making
Bias, Risks, and Limitations
- Models may over-engage with scammers in rare cases
- Possible false positives in benign conversations
- Cultural/linguistic bias: trained primarily on English data
- Risk of hallucination when generating long responses
Recommendations
- Always deploy with safety thresholds (δ, θ1, θ2)
- Use in controlled environments first (research, simulations)
- Extend to multilingual settings before real-world deployment
How to Get Started
from transformers import AutoModelForCausalLM, AutoTokenizer
# Replase the <x> with 2 or 3 and Nothing (when it is llama-guard-multi-task)
model_id = "supreme-lab/ai-in-the-loop/llama-guard-<x>-multi-task"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
inputs = tokenizer("Scammer: Hello, I need your SSN.", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
- Classification: SSD, SSC, SASC, MASC (synthetic scam/non-scam dialogues)
- Generation: SBC (254 real scam-baiting convs), ASB (>37k msgs), YTSC (YouTube scam transcriptions)
- Auxiliary: ConvAI, DailyDialog (engagement), HarmfulQA, Microsoft PII dataset
Training Procedure
- Fine-tuning setup:
- 3 epochs, batch size = 8
- LoRA rank = 8, α = 16
- Mixed precision (bf16)
- Optimizer: AdamW
- Federated Learning (FL):
- Simulated 10 clients, 30 rounds FedAvg
- Optional Differential Privacy (noise multipliers: 0.1, 0.8)
Evaluation
Metrics
- Classification: F1, AUPRC, FPR, FNR
- Generation: Perplexity, Distinct-1/2, DialogRPT, BERTScore, ROUGE-L, HarmBench
Results
- Classification: BiGRU/BiLSTM > 0.99 F1, RoBERTa competitive
- Instruction-tuned LLMs: MD-Judge best overall (F1 = 0.89+), LlamaGuard3 strong for moderation
- Generation: MD-Judge achieved lowest perplexity (22.3), highest engagement (0.79), 96% safety compliance in human evals
Environmental Impact
- Hardware: NVIDIA H100 GPUs
- Training Time: ~30 hrs across models
- Federated Setup: 10 simulated clients, 30 rounds
Technical Specifications
- Architecture: Instruction-tuned transformer (decoder-only)
- Objective: Multi-task (classification, risk scoring, safe generation)
Citation
If you use these models, please cite our paper:
@article{hossain2025aiintheloop,
title={AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning},
author={Hossain, Ismail; Puppala, Sai; Alam, Md Jahangir; and Talukder, Sajedul},
journal={[arXiv preprint arXiv:2509.05362](https://arxiv.org/abs/2509.05362)},
year={2025}
}
Contact
- Authors: [email protected], [email protected], [email protected]
- Lab: Supreme Lab
- Personal Web: https://ismail102.github.io/
Model tree for supreme-lab/ai-in-the-loop
Base model
OpenSafetyLab/MD-Judge-v0.1