π§ AI Text Detector v1.0 (DeBERTa-v3-large)
π·οΈ Model Details
Field | Description |
---|---|
Model Name | ai-text-detector-v-n4.0 |
Base Model | microsoft/deberta-v3-large |
Task | Text Classification (Human-written vs AI-generated) |
Language | English |
Framework | PyTorch, Transformers |
Trained by | Abhinav |
Fine-tuned using | Hugging Face Trainer API with early stopping, mixed precision (fp16), and F1 optimization. |
π Model Description
This model fine-tunes DeBERTa-v3-large for detecting whether a given text is written by a Human or generated by an AI.
It was trained on a custom dataset containing 10,000+ samples of diverse text across multiple topics, labeled as:
0
β Human-written text1
β AI-generated text
The goal is to identify subtle linguistic differences and stylistic cues between natural human writing and machine-generated content.
βοΈ Training Configuration
Parameter | Value |
---|---|
Epochs | 4 |
Batch size | 8 |
Learning Rate | 2e-5 |
Max Sequence Length | 256 |
Optimizer | AdamW |
Scheduler | Linear decay |
Weight Decay | 0.01 |
Seed | 42 |
Mixed Precision | β Yes (fp16) |
Gradient Accumulation Steps | 2 |
Frameworks | PyTorch, Transformers, Datasets |
π§Ύ Dataset
Field | Value |
---|---|
Source | Custom dataset (gpt_5_with_10k.csv ) |
Columns | text , label |
Labels | 0 = Human, 1 = AI |
Split | 90% Train / 10% Test |
Cleaning | Removed special characters, normalized whitespace, and standardized punctuation. |
π Evaluation Results
Metric | Score |
---|---|
Accuracy | ~0.97 |
F1 Score | ~0.97 |
Precision / Recall | Balanced |
Confusion Matrix Example:
[[4800 90] β True Human
[ 110 5000]] β True AI
π Usage Example
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "abhi099k/ai-text-detector-v-n4.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
text = "This article explores the evolution of large language models..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
pred = torch.argmax(outputs.logits, dim=1).item()
print("π§ Human" if pred == 0 else "π€ AI")
π Intended Use
- Detect AI-generated content for moderation, academic integrity, or authenticity verification.
- Use as a foundation model for fine-tuning on domain-specific datasets (e.g., essays, reviews, research papers).
β οΈ Limitations
- May misclassify paraphrased AI text or human text with robotic phrasing.
- Primarily trained on English β not guaranteed for other languages.
- Should not be used for punitive or high-stakes decisions without human review.
π Future Improvements
- Multi-language support (Hindi, Spanish, etc.)
- Add stylistic embeddings for cross-model generalization.
- Robustness testing against prompt-engineering and obfuscation.
π§© Technical Summary
Component | Library |
---|---|
Tokenization | AutoTokenizer |
Model | AutoModelForSequenceClassification |
Trainer | transformers.Trainer |
Metrics | evaluate (accuracy, f1) |
Visualization | matplotlib (confusion matrix) |
π¬ Citation
If you use this model, please cite:
@model{abhinav_ai_text_detector_v1,
title = {AI Text Detector v1.0 β DeBERTa-v3-large Fine-tune},
author = {Abhinav},
year = {2025},
url = {https://huggingface.co/Abhinav/ai-text-detector-v-n4.0}
}
- Downloads last month
- 285
Model tree for abhi099k/ai-text-detector-v-n4.0
Base model
microsoft/deberta-v3-large