🧠 AI Text Detector v1.0 (DeBERTa-v3-large)

🏷️ Model Details

Field	Description
Model Name	`ai-text-detector-v-n4.0`
Base Model	microsoft/deberta-v3-large
Task	Text Classification (Human-written vs AI-generated)
Language	English
Framework	PyTorch, Transformers
Trained by	Abhinav
Fine-tuned using	Hugging Face `Trainer` API with early stopping, mixed precision (fp16), and F1 optimization.

📖 Model Description

This model fine-tunes DeBERTa-v3-large for detecting whether a given text is written by a Human or generated by an AI.
It was trained on a custom dataset containing 10,000+ samples of diverse text across multiple topics, labeled as:

0 → Human-written text
1 → AI-generated text

The goal is to identify subtle linguistic differences and stylistic cues between natural human writing and machine-generated content.

⚙️ Training Configuration

Parameter	Value
Epochs	4
Batch size	8
Learning Rate	2e-5
Max Sequence Length	256
Optimizer	AdamW
Scheduler	Linear decay
Weight Decay	0.01
Seed	42
Mixed Precision	✅ Yes (fp16)
Gradient Accumulation Steps	2
Frameworks	PyTorch, Transformers, Datasets

🧾 Dataset

Field	Value
Source	Custom dataset (`gpt_5_with_10k.csv`)
Columns	`text`, `label`
Labels	0 = Human, 1 = AI
Split	90% Train / 10% Test
Cleaning	Removed special characters, normalized whitespace, and standardized punctuation.

📊 Evaluation Results

Metric	Score
Accuracy	~0.97
F1 Score	~0.97
Precision / Recall	Balanced

Confusion Matrix Example:

[[4800   90]    → True Human
 [ 110 5000]]   → True AI

🔍 Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "abhi099k/ai-text-detector-v-n4.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "This article explores the evolution of large language models..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
pred = torch.argmax(outputs.logits, dim=1).item()

print("🧑 Human" if pred == 0 else "🤖 AI")

📈 Intended Use

Detect AI-generated content for moderation, academic integrity, or authenticity verification.
Use as a foundation model for fine-tuning on domain-specific datasets (e.g., essays, reviews, research papers).

⚠️ Limitations

May misclassify paraphrased AI text or human text with robotic phrasing.
Primarily trained on English — not guaranteed for other languages.
Should not be used for punitive or high-stakes decisions without human review.

🏆 Future Improvements

Multi-language support (Hindi, Spanish, etc.)
Add stylistic embeddings for cross-model generalization.
Robustness testing against prompt-engineering and obfuscation.

🧩 Technical Summary

Component	Library
Tokenization	`AutoTokenizer`
Model	`AutoModelForSequenceClassification`
Trainer	`transformers.Trainer`
Metrics	`evaluate` (accuracy, f1)
Visualization	`matplotlib` (confusion matrix)

📬 Citation

If you use this model, please cite:

@model{abhinav_ai_text_detector_v1,
  title     = {AI Text Detector v1.0 – DeBERTa-v3-large Fine-tune},
  author    = {Abhinav},
  year      = {2025},
  url       = {https://huggingface.co/Abhinav/ai-text-detector-v-n4.0}
}

Downloads last month: 285

Safetensors

Model size

435M params

Tensor type

F32

Model tree for abhi099k/ai-text-detector-v-n4.0

Base model

microsoft/deberta-v3-large

Finetuned

(184)

this model

abhi099k
/

ai-text-detector-v-n4.0