Text Generation
Transformers
Safetensors
English

Model Card: AI-in-the-Loop for Real-Time Scam Detection & Scam-Baiting

This repository contains instruction-tuned large language models (LLMs) designed for real-time scam detection, conversational scam-baiting, and privacy-preserving federated learning.
The models are trained and evaluated as part of the paper:
AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning


Model Details

  • Developed by: Supreme Lab, University of Texas at El Paso & Southern Illinois University Carbondale
  • Funded by: U.S. National Science Foundation (Award No. 2451946) and U.S. Nuclear Regulatory Commission (Award No. 31310025M0012)
  • Shared by: Ismail Hossain, Sai Puppala, Sajedul Talukder, Md Jahangir Alam
  • Model type: Multi-task instruction-tuned LLMs (classification + safe text generation)
  • Languages: English
  • License: MIT
  • Finetuned from: LlamaGuard family & MD-Judge

Model Sources


Uses

Direct Use

  • Real-time scam classification (scam vs. non-scam conversations)
  • Conversational scam-baiting to waste scammer time safely
  • PII risk scoring to filter unsafe outputs

Downstream Use

  • Integration into messaging platforms for scam prevention
  • Benchmarks for AI safety alignment in adversarial contexts
  • Research in federated privacy-preserving LLMs

Out-of-Scope Use

  • Should not be used as a replacement for law enforcement tools
  • Should not be deployed without safety filters and human-in-the-loop monitoring
  • Not intended for financial or medical decision-making

Bias, Risks, and Limitations

  • Models may over-engage with scammers in rare cases
  • Possible false positives in benign conversations
  • Cultural/linguistic bias: trained primarily on English data
  • Risk of hallucination when generating long responses

Recommendations

  • Always deploy with safety thresholds (δ, θ1, θ2)
  • Use in controlled environments first (research, simulations)
  • Extend to multilingual settings before real-world deployment

How to Get Started

from transformers import AutoModelForCausalLM, AutoTokenizer
# Replase the <x> with 2 or 3 and Nothing (when it is llama-guard-multi-task)
model_id = "supreme-lab/ai-in-the-loop/llama-guard-<x>-multi-task"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

inputs = tokenizer("Scammer: Hello, I need your SSN.", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

  • Classification: SSD, SSC, SASC, MASC (synthetic scam/non-scam dialogues)
  • Generation: SBC (254 real scam-baiting convs), ASB (>37k msgs), YTSC (YouTube scam transcriptions)
  • Auxiliary: ConvAI, DailyDialog (engagement), HarmfulQA, Microsoft PII dataset

Training Procedure

  • Fine-tuning setup:
    • 3 epochs, batch size = 8
    • LoRA rank = 8, α = 16
    • Mixed precision (bf16)
    • Optimizer: AdamW
  • Federated Learning (FL):
    • Simulated 10 clients, 30 rounds FedAvg
    • Optional Differential Privacy (noise multipliers: 0.1, 0.8)

Evaluation

Metrics

  • Classification: F1, AUPRC, FPR, FNR
  • Generation: Perplexity, Distinct-1/2, DialogRPT, BERTScore, ROUGE-L, HarmBench

Results

  • Classification: BiGRU/BiLSTM > 0.99 F1, RoBERTa competitive
  • Instruction-tuned LLMs: MD-Judge best overall (F1 = 0.89+), LlamaGuard3 strong for moderation
  • Generation: MD-Judge achieved lowest perplexity (22.3), highest engagement (0.79), 96% safety compliance in human evals

Environmental Impact

  • Hardware: NVIDIA H100 GPUs
  • Training Time: ~30 hrs across models
  • Federated Setup: 10 simulated clients, 30 rounds

Technical Specifications

  • Architecture: Instruction-tuned transformer (decoder-only)
  • Objective: Multi-task (classification, risk scoring, safe generation)

Citation

If you use these models, please cite our paper:

@article{hossain2025aiintheloop,
  title={AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning},
  author={Hossain, Ismail; Puppala, Sai; Alam, Md Jahangir; and Talukder, Sajedul},
  journal={[arXiv preprint arXiv:2509.05362](https://arxiv.org/abs/2509.05362)},
  year={2025}
}

Contact


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for supreme-lab/ai-in-the-loop

Finetuned
(5)
this model

Datasets used to train supreme-lab/ai-in-the-loop