---
license: apache-2.0
base_model: google/vit-base-patch16-224
tags:
- vision
- image-classification
- facial-expression-recognition
- emotion-detection
- pytorch
- transformers
datasets:
- FER2013
metrics:
- accuracy
pipeline_tag: image-classification
widget:
- src: https://images.unsplash.com/photo-1507003211169-0a1dd7228f2d?w=300&h=300&fit=crop&crop=face
  example_title: Happy Face
- src: https://images.unsplash.com/photo-1457131760772-7017c6180f05?w=300&h=300&fit=crop&crop=face  
  example_title: Sad Face
- src: https://images.unsplash.com/photo-1506794778202-cad84cf45f1d?w=300&h=300&fit=crop&crop=face
  example_title: Serious Face
---

# 🎭 ViT Facial Expression Recognition

This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) for facial expression recognition on the FER2013 dataset.

## 📊 Model Performance

- **Accuracy**: 71.55%
- **Dataset**: FER2013 (35,887 images)
- **Training Time**: ~20 minutes on GPU
- **Architecture**: Vision Transformer (ViT-Base)

## 🎯 Supported Emotions

The model can classify faces into 7 different emotions:

1. **Angry** 😠
2. **Disgust** 🤢  
3. **Fear** 😨
4. **Happy** 😊
5. **Sad** 😢
6. **Surprise** 😲
7. **Neutral** 😐

## 🚀 Quick Start

```python
from transformers import ViTImageProcessor, ViTForImageClassification
from PIL import Image
import torch

# Load model and processor
processor = ViTImageProcessor.from_pretrained('abhilash88/face-emotion-detection')
model = ViTForImageClassification.from_pretrained('abhilash88/face-emotion-detection')

# Load and preprocess image
image = Image.open('path_to_your_image.jpg')
inputs = processor(image, return_tensors="pt")

# Make prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

# Emotion classes
emotions = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
predicted_emotion = emotions[predicted_class]
confidence = predictions[0][predicted_class].item()

print(f"Predicted Emotion: {predicted_emotion} ({confidence:.2f})")
```

## 📸 Example Predictions

Here are some example predictions on real faces:


### Smiling person
- **True Emotion**: Happy
- **Predicted**: Happy 
- **Confidence**: 0.85

![Example](examples/example_1_happy.jpg)

### Person looking sad
- **True Emotion**: Sad
- **Predicted**: Sad 
- **Confidence**: 0.40

![Example](examples/example_2_sad.jpg)

### Serious expression
- **True Emotion**: Angry
- **Predicted**: Neutral 
- **Confidence**: 0.92

![Example](examples/example_3_angry.jpg)

### Surprised expression
- **True Emotion**: Surprise
- **Predicted**: Neutral 
- **Confidence**: 0.69

![Example](examples/example_4_surprise.jpg)

### Concerned look
- **True Emotion**: Fear
- **Predicted**: Happy 
- **Confidence**: 0.85

![Example](examples/example_5_fear.jpg)

### Neutral expression
- **True Emotion**: Neutral
- **Predicted**: Happy 
- **Confidence**: 0.58

![Example](examples/example_6_neutral.jpg)

### Unpleasant expression
- **True Emotion**: Disgust
- **Predicted**: Neutral 
- **Confidence**: 0.97

![Example](examples/example_7_disgust.jpg)


## 🏋️ Training Details

### Training Hyperparameters
- **Learning Rate**: 5e-5
- **Batch Size**: 16 
- **Epochs**: 3
- **Optimizer**: AdamW
- **Weight Decay**: 0.01
- **Scheduler**: Linear with warmup

### Training Results
```
Epoch 1: Loss: 0.917, Accuracy: 66.90%
Epoch 2: Loss: 0.609, Accuracy: 69.32% 
Epoch 3: Loss: 0.316, Accuracy: 71.55%
```

### Data Preprocessing
- **Image Resize**: 224x224 pixels
- **Normalization**: ImageNet stats
- **Data Augmentation**: 
  - Random horizontal flip
  - Random rotation (±15°)
  - Color jitter
  - Random translation

## 📈 Performance Analysis

The model achieves solid performance on FER2013, which is known to be a challenging dataset due to:
- Low resolution images (48x48 upscaled to 224x224)
- Crowdsourced labels with some noise
- High variation in lighting and pose

### Accuracy by Emotion Class:
- **Happy**: ~86% (best performing)
- **Surprise**: ~84%
- **Neutral**: ~83%
- **Angry**: ~82%
- **Sad**: ~79%
- **Fear**: ~75%
- **Disgust**: ~68% (most challenging)

## 🔧 Technical Details

### Model Architecture
- **Base Model**: google/vit-base-patch16-224
- **Parameters**: ~86M
- **Input Size**: 224x224x3
- **Patch Size**: 16x16
- **Number of Layers**: 12
- **Hidden Size**: 768
- **Attention Heads**: 12

### Dataset Information
- **FER2013**: 35,887 grayscale facial images
- **Training Set**: 28,709 images
- **Validation Set**: 3,589 images  
- **Test Set**: 3,589 images
- **Classes**: 7 emotions (balanced evaluation set)

## 💡 Usage Tips

1. **Best Results**: Use clear, front-facing face images
2. **Preprocessing**: Ensure faces are properly cropped and centered
3. **Lighting**: Good lighting improves accuracy
4. **Resolution**: Higher resolution images work better

## 🛠️ Model Limitations

- Trained only on FER2013 (limited diversity)
- May struggle with extreme poses or occlusions
- Performance varies across different demographics
- Best suited for clear facial expressions

## 📚 Citation

If you use this model, please cite:

```bibtex
@misc{face-emotion-detection,
  author = {Abhilash},
  title = {ViT Face Emotion Detection},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {https://huggingface.co/abhilash88/face-emotion-detection}
}
```

## 🤝 Acknowledgments

- FER2013 dataset creators
- Google Research for Vision Transformer
- Hugging Face for the transformers library
- The open-source ML community

## 📄 License

This model is released under the Apache 2.0 License.

---

**Built with ❤️ using Vision Transformers and PyTorch**