--- language: en tags: - image-classification - vision-transformer - protovit - cub license: mit --- # ProtoViT Model - deit_small_patch16_224 (CUB) This is a fine-tuned deit_small_patch16_224 model trained on CUB-200-2011 from the paper ["Interpretable Image Classification with Adaptive Prototype-based Vision Transformers"](https://arxiv.org/abs/2410.20722). ## Model Details - Base architecture: deit_small_patch16_224 - Dataset: CUB-200-2011 - Number of classes: 200 - Fine-tuned checkpoint: `14finetuned0.8576` - Accuracy: 85.76% ## Training Details - Number of prototypes: 2000 - Prototype size: 1×1 - Training process: Warm up → Joint training → Push → Last layer fine-tuning - Weight coefficients: - Cross entropy: 1.0 - Clustering: -0.8 - Separation: 0.1 - L1: 0.01 - Orthogonal: 0.001 - Coherence: 0.003 - Batch size: 128 ## Dataset Description Fine-grained bird species classification dataset with 200 different bird species Dataset link: https://www.vision.caltech.edu/datasets/cub_200_2011/ ## Usage ```python from transformers import AutoImageProcessor, AutoModelForImageClassification from PIL import Image # Load model and processor model = AutoModelForImageClassification.from_pretrained("Ayushnangia/protovit-deit_small_patch16_224-cub") processor = AutoImageProcessor.from_pretrained("Ayushnangia/protovit-deit_small_patch16_224-cub") # Prepare image image = Image.open("path_to_your_image.jpg") inputs = processor(images=image, return_tensors="pt") # Make prediction outputs = model(**inputs) predicted_label = outputs.logits.argmax(-1).item() ``` ## Additional Information Github repo by authors of the paper ![GitHub repository][https://github.com/Henrymachiyu/ProtoViT] For more details about the implementation and training process, please visit the my fork of ProtoVit ![GitHub repository][https://github.com/ayushnangia/ProtoViT].