🧠 AI Text Detector v1.0 (DeBERTa-v3-large)

🏷️ Model Details

Field Description
Model Name ai-text-detector-v-n4.0
Base Model microsoft/deberta-v3-large
Task Text Classification (Human-written vs AI-generated)
Language English
Framework PyTorch, Transformers
Trained by Abhinav
Fine-tuned using Hugging Face Trainer API with early stopping, mixed precision (fp16), and F1 optimization.

πŸ“– Model Description

This model fine-tunes DeBERTa-v3-large for detecting whether a given text is written by a Human or generated by an AI.
It was trained on a custom dataset containing 10,000+ samples of diverse text across multiple topics, labeled as:

  • 0 β†’ Human-written text
  • 1 β†’ AI-generated text

The goal is to identify subtle linguistic differences and stylistic cues between natural human writing and machine-generated content.


βš™οΈ Training Configuration

Parameter Value
Epochs 4
Batch size 8
Learning Rate 2e-5
Max Sequence Length 256
Optimizer AdamW
Scheduler Linear decay
Weight Decay 0.01
Seed 42
Mixed Precision βœ… Yes (fp16)
Gradient Accumulation Steps 2
Frameworks PyTorch, Transformers, Datasets

🧾 Dataset

Field Value
Source Custom dataset (gpt_5_with_10k.csv)
Columns text, label
Labels 0 = Human, 1 = AI
Split 90% Train / 10% Test
Cleaning Removed special characters, normalized whitespace, and standardized punctuation.

πŸ“Š Evaluation Results

Metric Score
Accuracy ~0.97
F1 Score ~0.97
Precision / Recall Balanced

Confusion Matrix Example:

[[4800   90]    β†’ True Human
 [ 110 5000]]   β†’ True AI

πŸ” Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "abhi099k/ai-text-detector-v-n4.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "This article explores the evolution of large language models..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
pred = torch.argmax(outputs.logits, dim=1).item()

print("πŸ§‘ Human" if pred == 0 else "πŸ€– AI")

πŸ“ˆ Intended Use

  • Detect AI-generated content for moderation, academic integrity, or authenticity verification.
  • Use as a foundation model for fine-tuning on domain-specific datasets (e.g., essays, reviews, research papers).

⚠️ Limitations

  • May misclassify paraphrased AI text or human text with robotic phrasing.
  • Primarily trained on English β€” not guaranteed for other languages.
  • Should not be used for punitive or high-stakes decisions without human review.

πŸ† Future Improvements

  • Multi-language support (Hindi, Spanish, etc.)
  • Add stylistic embeddings for cross-model generalization.
  • Robustness testing against prompt-engineering and obfuscation.

🧩 Technical Summary

Component Library
Tokenization AutoTokenizer
Model AutoModelForSequenceClassification
Trainer transformers.Trainer
Metrics evaluate (accuracy, f1)
Visualization matplotlib (confusion matrix)

πŸ“¬ Citation

If you use this model, please cite:

@model{abhinav_ai_text_detector_v1,
  title     = {AI Text Detector v1.0 – DeBERTa-v3-large Fine-tune},
  author    = {Abhinav},
  year      = {2025},
  url       = {https://huggingface.co/Abhinav/ai-text-detector-v-n4.0}
}
Downloads last month
285
Safetensors
Model size
435M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for abhi099k/ai-text-detector-v-n4.0

Finetuned
(184)
this model

Spaces using abhi099k/ai-text-detector-v-n4.0 2