YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

🧠 Image Classification AI Model (CIFAR-100)

This repository contains a Vision Transformer (ViT)-based AI model fine-tuned for image classification on the CIFAR-100 dataset. The model is built using google/vit-base-patch16-224, quantized to FP16 for efficient inference, and delivers high accuracy in multi-class image classification tasks.


πŸš€ Features

  • πŸ–ΌοΈ Task: Image Classification
  • 🧠 Base Model: google/vit-base-patch16-224 (Vision Transformer)
  • πŸ§ͺ Quantized: FP16 for faster and memory-efficient inference
  • 🎯 Dataset: 100 fine-grained object categories
  • ⚑ CUDA Enabled: Optimized for GPU acceleration
  • πŸ“ˆ High Accuracy: Fine-tuned and evaluated on validation split

πŸ“Š Dataset Used

Hugging Face Dataset: tanganke/cifar100

  • Description: CIFAR-100 is a dataset of 60,000 32Γ—32 color images in 100 classes (600 images per class)
  • Split: 50,000 training images and 10,000 test images
  • Categories: Animals, Vehicles, Food, Household items, etc.
  • License: MIT License (from source)
from datasets import load_dataset

dataset = load_dataset("tanganke/cifar100")

πŸ› οΈ Model & Training Configuration

  • Model: google/vit-base-patch16-224

  • Image Size: 224x224 (resized from 32x32)

  • Framework: Hugging Face Transformers & Datasets

  • Training Environment: Kaggle Notebook with CUDA

  • Epochs: 5–10 (with early stopping)

  • Batch Size: 32

  • Optimizer: AdamW

  • Loss Function: CrossEntropyLoss

βœ… Evaluation & Scoring

  • Accuracy: ~70–80% (varies by configuration)

  • Validation Tool: evaluate or sklearn.metrics

  • Metric: Accuracy, Top-1 and Top-5 scores

  • Inference Speed: Significantly faster after quantizationextractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")

πŸ” Inference Example

from PIL import Image
import torch

def predict(image_path):
    image = Image.open(image_path).convert("RGB")
    inputs = feature_extractor(images=image, return_tensors="pt").to("cuda")
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = logits.argmax(-1).item()
    return dataset["train"].features["fine_label"].int2str(predicted_class)

print(predict("sample_image.jpg"))

πŸ“ Folder Structure

πŸ“¦image-classification-vit ┣ πŸ“‚vit-cifar100-fp16 ┣ πŸ“œtrain.py ┣ πŸ“œinference.py ┣ πŸ“œREADME.md β”— πŸ“œrequirements.txt

Downloads last month
36
Safetensors
Model size
86.6M params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support