Upload README.md with huggingface_hub

d407389 verified 2 months ago

5.83 kB

	---
	license: apache-2.0
	base_model: google/vit-base-patch16-224
	tags:
	- vision
	- image-classification
	- facial-expression-recognition
	- emotion-detection
	- pytorch
	- transformers
	datasets:
	- FER2013
	metrics:
	- accuracy
	pipeline_tag: image-classification
	widget:
	- src: https://images.unsplash.com/photo-1507003211169-0a1dd7228f2d?w=300&h=300&fit=crop&crop=face
	example_title: Happy Face
	- src: https://images.unsplash.com/photo-1457131760772-7017c6180f05?w=300&h=300&fit=crop&crop=face
	example_title: Sad Face
	- src: https://images.unsplash.com/photo-1506794778202-cad84cf45f1d?w=300&h=300&fit=crop&crop=face
	example_title: Serious Face
	---

	# 🎭 ViT Facial Expression Recognition

	This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) for facial expression recognition on the FER2013 dataset.

	## 📊 Model Performance

	- Accuracy: 71.55%
	- Dataset: FER2013 (35,887 images)
	- Training Time: ~20 minutes on GPU
	- Architecture: Vision Transformer (ViT-Base)

	## 🎯 Supported Emotions

	The model can classify faces into 7 different emotions:

	1. Angry 😠
	2. Disgust 🤢
	3. Fear 😨
	4. Happy 😊
	5. Sad 😢
	6. Surprise 😲
	7. Neutral 😐

	## 🚀 Quick Start

	```python
	from transformers import ViTImageProcessor, ViTForImageClassification
	from PIL import Image
	import torch

	# Load model and processor
	processor = ViTImageProcessor.from_pretrained('abhilash88/face-emotion-detection')
	model = ViTForImageClassification.from_pretrained('abhilash88/face-emotion-detection')

	# Load and preprocess image
	image = Image.open('path_to_your_image.jpg')
	inputs = processor(image, return_tensors="pt")

	# Make prediction
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
	predicted_class = torch.argmax(predictions, dim=-1).item()

	# Emotion classes
	emotions = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
	predicted_emotion = emotions[predicted_class]
	confidence = predictions[0][predicted_class].item()

	print(f"Predicted Emotion: {predicted_emotion} ({confidence:.2f})")
	```

	## 📸 Example Predictions

	Here are some example predictions on real faces:


	### Smiling person
	- True Emotion: Happy
	- Predicted: Happy
	- Confidence: 0.85

	![Example](examples/example_1_happy.jpg)

	### Person looking sad
	- True Emotion: Sad
	- Predicted: Sad
	- Confidence: 0.40

	![Example](examples/example_2_sad.jpg)

	### Serious expression
	- True Emotion: Angry
	- Predicted: Neutral
	- Confidence: 0.92

	![Example](examples/example_3_angry.jpg)

	### Surprised expression
	- True Emotion: Surprise
	- Predicted: Neutral
	- Confidence: 0.69

	![Example](examples/example_4_surprise.jpg)

	### Concerned look
	- True Emotion: Fear
	- Predicted: Happy
	- Confidence: 0.85

	![Example](examples/example_5_fear.jpg)

	### Neutral expression
	- True Emotion: Neutral
	- Predicted: Happy
	- Confidence: 0.58

	![Example](examples/example_6_neutral.jpg)

	### Unpleasant expression
	- True Emotion: Disgust
	- Predicted: Neutral
	- Confidence: 0.97

	![Example](examples/example_7_disgust.jpg)


	## 🏋️ Training Details

	### Training Hyperparameters
	- Learning Rate: 5e-5
	- Batch Size: 16
	- Epochs: 3
	- Optimizer: AdamW
	- Weight Decay: 0.01
	- Scheduler: Linear with warmup

	### Training Results
	```
	Epoch 1: Loss: 0.917, Accuracy: 66.90%
	Epoch 2: Loss: 0.609, Accuracy: 69.32%
	Epoch 3: Loss: 0.316, Accuracy: 71.55%
	```

	### Data Preprocessing
	- Image Resize: 224x224 pixels
	- Normalization: ImageNet stats
	- Data Augmentation:
	- Random horizontal flip
	- Random rotation (±15°)
	- Color jitter
	- Random translation

	## 📈 Performance Analysis

	The model achieves solid performance on FER2013, which is known to be a challenging dataset due to:
	- Low resolution images (48x48 upscaled to 224x224)
	- Crowdsourced labels with some noise
	- High variation in lighting and pose

	### Accuracy by Emotion Class:
	- Happy: ~86% (best performing)
	- Surprise: ~84%
	- Neutral: ~83%
	- Angry: ~82%
	- Sad: ~79%
	- Fear: ~75%
	- Disgust: ~68% (most challenging)

	## 🔧 Technical Details

	### Model Architecture
	- Base Model: google/vit-base-patch16-224
	- Parameters: ~86M
	- Input Size: 224x224x3
	- Patch Size: 16x16
	- Number of Layers: 12
	- Hidden Size: 768
	- Attention Heads: 12

	### Dataset Information
	- FER2013: 35,887 grayscale facial images
	- Training Set: 28,709 images
	- Validation Set: 3,589 images
	- Test Set: 3,589 images
	- Classes: 7 emotions (balanced evaluation set)

	## 💡 Usage Tips

	1. Best Results: Use clear, front-facing face images
	2. Preprocessing: Ensure faces are properly cropped and centered
	3. Lighting: Good lighting improves accuracy
	4. Resolution: Higher resolution images work better

	## 🛠️ Model Limitations

	- Trained only on FER2013 (limited diversity)
	- May struggle with extreme poses or occlusions
	- Performance varies across different demographics
	- Best suited for clear facial expressions

	## 📚 Citation

	If you use this model, please cite:

	```bibtex
	@misc{face-emotion-detection,
	author = {Abhilash},
	title = {ViT Face Emotion Detection},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {https://huggingface.co/abhilash88/face-emotion-detection}
	}
	```

	## 🤝 Acknowledgments

	- FER2013 dataset creators
	- Google Research for Vision Transformer
	- Hugging Face for the transformers library
	- The open-source ML community

	## 📄 License

	This model is released under the Apache 2.0 License.

	---

	Built with ❤️ using Vision Transformers and PyTorch