Model Card: AI-in-the-Loop for Real-Time Scam Detection & Scam-Baiting

This repository contains instruction-tuned large language models (LLMs) designed for real-time scam detection, conversational scam-baiting, and privacy-preserving federated learning.
The models are trained and evaluated as part of the paper:
AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning

Model Details

Developed by: Supreme Lab, University of Texas at El Paso & Southern Illinois University Carbondale
Funded by: U.S. National Science Foundation (Award No. 2451946) and U.S. Nuclear Regulatory Commission (Award No. 31310025M0012)
Shared by: Ismail Hossain, Sai Puppala, Sajedul Talukder, Md Jahangir Alam
Model type: Multi-task instruction-tuned LLMs (classification + safe text generation)
Languages: English
License: MIT
Finetuned from: LlamaGuard family & MD-Judge

Model Sources

Repository: GitHub – supreme-lab/ai-in-the-loop
Hugging Face: supreme-lab/ai-in-the-loop
Paper: ArXiv version

Uses

Direct Use

Real-time scam classification (scam vs. non-scam conversations)
Conversational scam-baiting to waste scammer time safely
PII risk scoring to filter unsafe outputs

Downstream Use

Integration into messaging platforms for scam prevention
Benchmarks for AI safety alignment in adversarial contexts
Research in federated privacy-preserving LLMs

Out-of-Scope Use

Should not be used as a replacement for law enforcement tools
Should not be deployed without safety filters and human-in-the-loop monitoring
Not intended for financial or medical decision-making

Bias, Risks, and Limitations

Models may over-engage with scammers in rare cases
Possible false positives in benign conversations
Cultural/linguistic bias: trained primarily on English data
Risk of hallucination when generating long responses

Recommendations

Always deploy with safety thresholds (δ, θ1, θ2)
Use in controlled environments first (research, simulations)
Extend to multilingual settings before real-world deployment

How to Get Started

from transformers import AutoModelForCausalLM, AutoTokenizer
# Replase the <x> with 2 or 3 and Nothing (when it is llama-guard-multi-task)
model_id = "supreme-lab/ai-in-the-loop/llama-guard-<x>-multi-task"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

inputs = tokenizer("Scammer: Hello, I need your SSN.", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

Classification: SSD, SSC, SASC, MASC (synthetic scam/non-scam dialogues)
Generation: SBC (254 real scam-baiting convs), ASB (>37k msgs), YTSC (YouTube scam transcriptions)
Auxiliary: ConvAI, DailyDialog (engagement), HarmfulQA, Microsoft PII dataset

Training Procedure

Fine-tuning setup:
- 3 epochs, batch size = 8
- LoRA rank = 8, α = 16
- Mixed precision (bf16)
- Optimizer: AdamW
Federated Learning (FL):
- Simulated 10 clients, 30 rounds FedAvg
- Optional Differential Privacy (noise multipliers: 0.1, 0.8)

Evaluation

Metrics

Classification: F1, AUPRC, FPR, FNR
Generation: Perplexity, Distinct-1/2, DialogRPT, BERTScore, ROUGE-L, HarmBench

Results

Classification: BiGRU/BiLSTM > 0.99 F1, RoBERTa competitive
Instruction-tuned LLMs: MD-Judge best overall (F1 = 0.89+), LlamaGuard3 strong for moderation
Generation: MD-Judge achieved lowest perplexity (22.3), highest engagement (0.79), 96% safety compliance in human evals

Environmental Impact

Hardware: NVIDIA H100 GPUs
Training Time: ~30 hrs across models
Federated Setup: 10 simulated clients, 30 rounds

Technical Specifications

Architecture: Instruction-tuned transformer (decoder-only)
Objective: Multi-task (classification, risk scoring, safe generation)

Citation

If you use these models, please cite our paper:

@article{hossain2025aiintheloop,
  title={AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning},
  author={Hossain, Ismail; Puppala, Sai; Alam, Md Jahangir; and Talukder, Sajedul},
  journal={[arXiv preprint arXiv:2509.05362](https://arxiv.org/abs/2509.05362)},
  year={2025}
}

Contact

Authors: [email protected], [email protected], [email protected]
Lab: Supreme Lab
Personal Web: https://ismail102.github.io/

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for supreme-lab/ai-in-the-loop

Base model

OpenSafetyLab/MD-Judge-v0.1

Finetuned

(1)

this model

supreme-lab
/

ai-in-the-loop