XenArcAI
AIRealNet
Model type: Image Classification (Binary)
Task: AI-generated vs Human image detection
Base model: Microsoft/swinv2-tiny-patch4-window16-256
Fine-tuned by: Parveshiiii/AI-vs-Real dataset(Open source split of main dataset)
Overview
In an era of rapidly advancing AI-generated imagery, deepfakes, and synthetic media, the need for reliable detection tools has never been higher. XenArcAI/AIRealNet is a binary image classifier explicitly designed to distinguish AI-generated images from real human photographs. This model is optimized to detect conventional AI-generated content while adhering to strict privacy standards—avoiding personal or sensitive images.
- Class 0: AI-generated image
- Class 1: Real human image
By leveraging the robust SwinV2 Tiny architecture as its backbone, AIRealNet achieves a high degree of accuracy while remaining lightweight enough for practical deployment.
Key Features
High Accuracy on Public Datasets: Despite using a 14k-image fine-tuning split(Part of main fine tuning split), AIRealNet demonstrates exceptional accuracy and robustness in detecting AI-generated images.
Balanced Training Split: The dataset contains a balanced number of AI-generated and real images, ensuring unbiased training and minimizing class imbalance issues.
- AI-Generated: 60%
- Human-Images: 40%
Ethical Design: No personal photos were included, even if edited or AI-modified, respecting privacy and ethical AI principles.
Fast and Scalable: Based on a transformer vision model, AIRealNet can be deployed efficiently in both research and production environments.
Training Data
- Dataset:
Parveshiiii/AI-vs-Real(open-sourced subset of main dataset ) - Size: 14k images (balanced between AI and human)
- Split: Used the train split for fine-tuning; validation performed on a separate balanced subset.
- Notes: Images sourced from public datasets and AI generation tools. Edited personal photos were intentionally excluded.
Limitations
While AIRealNet performs exceptionally well on typical AI-generated images, users should note:
- Subtle Edits: The model struggles with nano-scale edits or ultra-precise modifications, like “nano banana” edits.
- Edited Personal Images(over precise): Images of real people that have been AI-modified are not detected, aligning with privacy and ethical guidelines.
- Domain Generalization: Performance may vary on images from completely unseen AI generators or extremely unconventional content.
Performance Metrics
Metrics shown are from Epoch 2, chosen to illustrate stable performance after fine-tuning.
Note: Extremely low loss and high accuracy are due to the controlled dataset environment. Real-world performance may be lower depending on the image domain.(In our testing this is model is over accurate despite it can't detect Nano-Banana images(only edited fully generated images can be detected over accurately))
Demo and Usage
- Installing dependecies
pip install -U transformers
- Loading and running a demo
from transformers import pipeline
pipe = pipeline("image-classification", model="XenArcAI/AIRealNet")
pipe("https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/eVkKUTdiInUl6pbIUghQC.png")# example image
Demo
- Given Image(Checkout Maths best filtered dataset focused on reasoning on XenArcAI)
- Model Output
[{'label': 'artificial', 'score': 0.9865425825119019},
{'label': 'real', 'score': 0.013457471504807472}]
Note: its correct as the image was generated by a diffusion model
Intended Use
- Detect AI-generated imagery on social media, research publications, and digital media platforms.
- Assist content moderators, researchers, and fact-checkers in identifying synthetic media.
- Not intended for legal verification without human corroboration.
Ethical Considerations
- Privacy-first Approach: Personal photos, even if AI-edited, were excluded.
- Responsible Deployment: Users should combine model predictions with human review to avoid false positives or negatives.
- Transparency: The model card openly communicates its limitations and dataset design to prevent misuse.
How It Works
- Images are preprocessed and resized to
256x256. - Features are extracted using the SwinV2 Tiny vision transformer backbone.
- A binary classification head outputs probabilities for AI-generated vs real human images.
- Predictions are interpreted as class 0 (AI) or class 1 (Human).
Future Work
Future iterations aim to:
- Improve detection of subtle AI-generated edits and “nano banana” modifications.
- Expand training data with diverse AI generators to enhance generalization.
- Explore multi-modal detection capabilities (e.g., video, metadata, and image combined).
References
- Microsoft SwinV2 Tiny: https://github.com/microsoft/Swin-Transformer
- Parveshiiii/AI-vs-Real dataset (subset): Open-sourced by our team member
- Downloads last month
- 113