abhilash88's picture
Update README.md
0954262 verified
|
raw
history blame
6.55 kB
metadata
library_name: pytorch
pipeline_tag: image-classification
tags:
  - vision-transformer
  - age-estimation
  - gender-classification
  - face-analysis
  - computer-vision
  - pytorch
  - transformers
  - multi-task-learning
language:
  - en
license: apache-2.0
datasets:
  - UTKFace
metrics:
  - accuracy
  - mae
model-index:
  - name: ViT-Age-Gender-Elite
    results:
      - task:
          type: image-classification
          name: Gender Classification
        dataset:
          name: UTKFace
          type: face-analysis
        metrics:
          - type: accuracy
            value: 94.3
            name: Gender Accuracy
          - type: mae
            value: 4.5
            name: Age MAE (years)

πŸ† ViT-Age-Gender-Elite: Vision Transformer for Facial Analysis

βœ… MODEL WEIGHTS NOW AVAILABLE - Trained model weights uploaded and ready for use!

🎯 Quick Usage

import torch
from transformers import ViTImageProcessor
from model import AgeGenderViTModel  # Use the model.py from this repo

# Load model
model = AgeGenderViTModel()
model.load_state_dict(torch.load("pytorch_model.bin"))
model.eval()

# Load processor
processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")

# Predict on image
from PIL import Image
image = Image.open("your_image.jpg")
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    age_pred, gender_pred = model(inputs["pixel_values"])
    
age = int(age_pred.item())
gender = "Female" if gender_pred.item() > 0.5 else "Male"
confidence = gender_pred.item() if gender_pred.item() > 0.5 else 1 - gender_pred.item()

print(f"Age: {age} years, Gender: {gender}, Confidence: {confidence:.1%}")

πŸ† Performance Achievements

  • βœ… 94.3% Gender Accuracy - ELITE tier performance
  • βœ… 4.5 Years Age MAE - Research-grade precision
  • βœ… 86.8M Parameters - Optimally fine-tuned Vision Transformer
  • βœ… Production Ready - Stable, consistent results

πŸ“Š Dataset & Training Details

Training Dataset: UTKFace

  • Total Images: 23,687 facial images
  • Age Range: 1-100 years
  • Demographics: Balanced gender distribution (52.3% Male, 47.7% Female)
  • Quality: High-resolution, diverse lighting and pose conditions

⚠️ Important Dataset Characteristics

The UTKFace dataset has a specific age distribution:

  • Adults (21-50 years): ~70% of data (majority)
  • Young Adults (16-30 years): ~20% of data
  • Children (0-15 years): ~5% of data (limited)
  • Seniors (50+ years): ~5% of data

🎯 Model Performance by Age Group

  • Excellent: Adults and young adults (16-60 years) - 94.3% gender accuracy
  • Good: Teenagers (13-20 years) - ~90% accuracy
  • Limited: Children (0-12 years) - Reduced accuracy due to limited training data
  • Good: Seniors (60+ years) - ~85% accuracy

πŸ”„ Upcoming Improvements

Version 2.0 - Enhanced Children Support (In Development)

  • 🎯 Training on FairFace Dataset - Better age distribution
  • πŸ‘Ά Children-Specific Fine-tuning - Focused 0-15 years training
  • πŸ“Š APPA-REAL Integration - Apparent age dataset inclusion
  • 🎨 Data Augmentation - Synthetic children faces generation

Planned Enhancements

  • Multi-Age Ensemble: Specialized models for different age ranges
  • Cross-Cultural Training: Enhanced performance across ethnicities
  • Age-Specific Confidence: Different confidence thresholds per age group
  • Real-time Optimization: Mobile and edge device deployment

πŸ“ˆ Current Model Strengths

Best Use Cases

  • βœ… Adult demographic analysis (primary strength)
  • βœ… Social media content filtering (teen/adult classification)
  • βœ… Marketing analytics (adult age segmentation)
  • βœ… Security applications (adult age verification)

Architecture Advantages

  • Vision Transformer: Superior to CNN-based approaches
  • Multi-task Learning: Joint age and gender optimization
  • Transfer Learning: Built on google/vit-base-patch16-224
  • Robust Features: Handles various lighting and pose conditions

πŸ“Š Technical Specifications

Model Architecture

  • Base: google/vit-base-patch16-224
  • Parameters: 86.8M total
  • Input: 224Γ—224 RGB images
  • Outputs: Age (regression) + Gender (binary classification)
  • Attention Heads: 12
  • Transformer Layers: 12

Training Configuration

  • Epochs: 15 (fully converged)
  • Optimizer: AdamW (lr=2e-5)
  • Batch Size: 32
  • Training Time: 2.95 hours on GPU
  • Validation Split: 80/20 stratified

πŸ“Š Files Included

  • pytorch_model.bin - Trained model weights (331MB)
  • config.json - Model configuration and metadata
  • training_logs.json - Complete training history and metrics
  • model.py - Model architecture and usage code

⚠️ Usage Recommendations

Optimal Performance

  • Primary Use: Adults and young adults (16-60 years)
  • High Confidence: Gender classification across all ages
  • Reasonable Accuracy: Age estimation for adults

Limitations to Consider

  • Children (0-12 years): Limited training data may affect accuracy
  • Very elderly (70+ years): Fewer training examples
  • Extreme poses/lighting: May reduce performance

Best Practices

  • Face Detection: Ensure clear, front-facing faces
  • Image Quality: Use good lighting and resolution
  • Age Context: Consider model strengths for your use case
  • Confidence Thresholds: Adjust based on your application needs

πŸ”¬ Research & Citation

@misc{vit-age-gender-elite-2025,
  title={ViT-Age-Gender-Elite: Vision Transformer for Facial Analysis},
  author={Abhilash Sahoo},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/abhilash88/ViT-Age-Gender-Elite}
}

🀝 Contributing & Feedback

We welcome contributions and feedback, especially:

  • Children dataset suggestions for Version 2.0
  • Performance evaluations on diverse datasets
  • Use case feedback for model improvements
  • Technical optimizations and enhancements

πŸ“ˆ Roadmap

  • Q1 2025: Children-focused fine-tuning (Version 2.0)
  • Q2 2025: Multi-cultural dataset integration
  • Q3 2025: Mobile optimization and edge deployment
  • Q4 2025: Real-time video analysis capabilities

Current Version: 1.0 (Adult-focused) | Next Version: 2.0 (Children-enhanced) | Status: Production Ready*

*Best performance on adults (16-60 years). Children support improved in upcoming Version 2.0.