--- language: en tags: - image-classification - vision-transformer - protovit - pins license: mit --- # ProtoViT Model - deit_small_patch16_224 (PINS) This is a fine-tuned deit_small_patch16_224 model trained on Pinterest Face Recognition Dataset from the paper ["Interpretable Image Classification with Adaptive Prototype-based Vision Transformers"](https://arxiv.org/abs/2410.20722). ## Model Details - Base architecture: deit_small_patch16_224 - Dataset: Pinterest Face Recognition Dataset - Number of classes: 155 - Fine-tuned checkpoint: `14finetuned0.8042` - Accuracy: 80.42% ## Training Details - Number of prototypes: 1550 - Prototype size: 1×1 - Training process: Warm up → Joint training → Push → Last layer fine-tuning - Weight coefficients: - Cross entropy: 1.0 - Clustering: -0.8 - Separation: 0.1 - L1: 0.01 - Orthogonal: 0.001 - Coherence: 0.003 - Batch size: 128 ## Dataset Description A face recognition dataset collected from Pinterest containing 155 different identity classes Dataset link: https://www.kaggle.com/datasets/hereisburak/pins-face-recognition ## Usage ```python from transformers import AutoImageProcessor, AutoModelForImageClassification from PIL import Image # Load model and processor model = AutoModelForImageClassification.from_pretrained("Ayushnangia/protovit-deit_small_patch16_224-pins") processor = AutoImageProcessor.from_pretrained("Ayushnangia/protovit-deit_small_patch16_224-pins") # Prepare image image = Image.open("path_to_your_image.jpg") inputs = processor(images=image, return_tensors="pt") # Make prediction outputs = model(**inputs) predicted_label = outputs.logits.argmax(-1).item() ``` ## Additional Information Github repo by authors of the paper ![GitHub repository][https://github.com/Henrymachiyu/ProtoViT] For more details about the implementation and training process, please visit the my fork of ProtoVit [GitHub repository](https://github.com/ayushnangia/ProtoViT).