abhilash88's picture
Upload README.md with huggingface_hub
d407389 verified
|
raw
history blame
5.83 kB
---
license: apache-2.0
base_model: google/vit-base-patch16-224
tags:
- vision
- image-classification
- facial-expression-recognition
- emotion-detection
- pytorch
- transformers
datasets:
- FER2013
metrics:
- accuracy
pipeline_tag: image-classification
widget:
- src: https://images.unsplash.com/photo-1507003211169-0a1dd7228f2d?w=300&h=300&fit=crop&crop=face
example_title: Happy Face
- src: https://images.unsplash.com/photo-1457131760772-7017c6180f05?w=300&h=300&fit=crop&crop=face
example_title: Sad Face
- src: https://images.unsplash.com/photo-1506794778202-cad84cf45f1d?w=300&h=300&fit=crop&crop=face
example_title: Serious Face
---
# 🎭 ViT Facial Expression Recognition
This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) for facial expression recognition on the FER2013 dataset.
## πŸ“Š Model Performance
- **Accuracy**: 71.55%
- **Dataset**: FER2013 (35,887 images)
- **Training Time**: ~20 minutes on GPU
- **Architecture**: Vision Transformer (ViT-Base)
## 🎯 Supported Emotions
The model can classify faces into 7 different emotions:
1. **Angry** 😠
2. **Disgust** 🀒
3. **Fear** 😨
4. **Happy** 😊
5. **Sad** 😒
6. **Surprise** 😲
7. **Neutral** 😐
## πŸš€ Quick Start
```python
from transformers import ViTImageProcessor, ViTForImageClassification
from PIL import Image
import torch
# Load model and processor
processor = ViTImageProcessor.from_pretrained('abhilash88/face-emotion-detection')
model = ViTForImageClassification.from_pretrained('abhilash88/face-emotion-detection')
# Load and preprocess image
image = Image.open('path_to_your_image.jpg')
inputs = processor(image, return_tensors="pt")
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()
# Emotion classes
emotions = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
predicted_emotion = emotions[predicted_class]
confidence = predictions[0][predicted_class].item()
print(f"Predicted Emotion: {predicted_emotion} ({confidence:.2f})")
```
## πŸ“Έ Example Predictions
Here are some example predictions on real faces:
### Smiling person
- **True Emotion**: Happy
- **Predicted**: Happy
- **Confidence**: 0.85
![Example](examples/example_1_happy.jpg)
### Person looking sad
- **True Emotion**: Sad
- **Predicted**: Sad
- **Confidence**: 0.40
![Example](examples/example_2_sad.jpg)
### Serious expression
- **True Emotion**: Angry
- **Predicted**: Neutral
- **Confidence**: 0.92
![Example](examples/example_3_angry.jpg)
### Surprised expression
- **True Emotion**: Surprise
- **Predicted**: Neutral
- **Confidence**: 0.69
![Example](examples/example_4_surprise.jpg)
### Concerned look
- **True Emotion**: Fear
- **Predicted**: Happy
- **Confidence**: 0.85
![Example](examples/example_5_fear.jpg)
### Neutral expression
- **True Emotion**: Neutral
- **Predicted**: Happy
- **Confidence**: 0.58
![Example](examples/example_6_neutral.jpg)
### Unpleasant expression
- **True Emotion**: Disgust
- **Predicted**: Neutral
- **Confidence**: 0.97
![Example](examples/example_7_disgust.jpg)
## πŸ‹οΈ Training Details
### Training Hyperparameters
- **Learning Rate**: 5e-5
- **Batch Size**: 16
- **Epochs**: 3
- **Optimizer**: AdamW
- **Weight Decay**: 0.01
- **Scheduler**: Linear with warmup
### Training Results
```
Epoch 1: Loss: 0.917, Accuracy: 66.90%
Epoch 2: Loss: 0.609, Accuracy: 69.32%
Epoch 3: Loss: 0.316, Accuracy: 71.55%
```
### Data Preprocessing
- **Image Resize**: 224x224 pixels
- **Normalization**: ImageNet stats
- **Data Augmentation**:
- Random horizontal flip
- Random rotation (Β±15Β°)
- Color jitter
- Random translation
## πŸ“ˆ Performance Analysis
The model achieves solid performance on FER2013, which is known to be a challenging dataset due to:
- Low resolution images (48x48 upscaled to 224x224)
- Crowdsourced labels with some noise
- High variation in lighting and pose
### Accuracy by Emotion Class:
- **Happy**: ~86% (best performing)
- **Surprise**: ~84%
- **Neutral**: ~83%
- **Angry**: ~82%
- **Sad**: ~79%
- **Fear**: ~75%
- **Disgust**: ~68% (most challenging)
## πŸ”§ Technical Details
### Model Architecture
- **Base Model**: google/vit-base-patch16-224
- **Parameters**: ~86M
- **Input Size**: 224x224x3
- **Patch Size**: 16x16
- **Number of Layers**: 12
- **Hidden Size**: 768
- **Attention Heads**: 12
### Dataset Information
- **FER2013**: 35,887 grayscale facial images
- **Training Set**: 28,709 images
- **Validation Set**: 3,589 images
- **Test Set**: 3,589 images
- **Classes**: 7 emotions (balanced evaluation set)
## πŸ’‘ Usage Tips
1. **Best Results**: Use clear, front-facing face images
2. **Preprocessing**: Ensure faces are properly cropped and centered
3. **Lighting**: Good lighting improves accuracy
4. **Resolution**: Higher resolution images work better
## πŸ› οΈ Model Limitations
- Trained only on FER2013 (limited diversity)
- May struggle with extreme poses or occlusions
- Performance varies across different demographics
- Best suited for clear facial expressions
## πŸ“š Citation
If you use this model, please cite:
```bibtex
@misc{face-emotion-detection,
author = {Abhilash},
title = {ViT Face Emotion Detection},
year = {2025},
publisher = {Hugging Face},
howpublished = {https://huggingface.co/abhilash88/face-emotion-detection}
}
```
## 🀝 Acknowledgments
- FER2013 dataset creators
- Google Research for Vision Transformer
- Hugging Face for the transformers library
- The open-source ML community
## πŸ“„ License
This model is released under the Apache 2.0 License.
---
**Built with ❀️ using Vision Transformers and PyTorch**