Vision Transformer (ViT) Fine-Tuned Model

Vision Transformer (ViT) Fine-Tuned Model

This repository contains a fine-tuned version of google/vit-large-patch16-224, optimized for a custom image classification task.


πŸ“Œ Model Overview

  • Base model: google/vit-large-patch16-224
  • Architecture: Vision Transformer (ViT)
  • Patch size: 16Γ—16
  • Image resolution: 224Γ—224
  • Frameworks: PyTorch, Hugging Face Transformers

πŸ“Š Performance

Metric Value
Final Validation Loss 0.3268
Lowest Validation Loss 0.2548 (Epoch 18)

Training loss and validation loss trends indicate good convergence with slight overfitting after ~30 epochs.


πŸ”§ Training Configuration

Hyperparameter Value
Learning rate 2e-5
Train batch size 20
Eval batch size 8
Optimizer AdamW (betas=(0.9, 0.999), eps=1e-8)
LR scheduler Linear
Epochs 40
Seed 42
Framework versions Transformers 4.52.4, PyTorch 2.6.0+cu124, Datasets 3.6.0, Tokenizers 0.21.2

πŸ“‚ Training Results

Epoch Step Validation Loss
1 24 0.5601
5 120 0.3421
10 240 0.2901
14 336 0.2737
18 432 0.2548
40 960 0.3268

πŸ›  Intended Uses

  • Image classification on datasets with characteristics similar to the training dataset.
  • Fine-tuning for domain-specific classification tasks.

⚠ Limitations

  • Trained on a custom dataset β€” may not generalize well to unrelated domains without additional fine-tuning.
  • No guarantees on fairness, bias, or ethical implications without dataset analysis.

πŸš€ How to Use

You can use this model in two main ways:

1️⃣ Using the High-Level pipeline API

from transformers import pipeline

pipe = pipeline("image-classification", model="rakib730/output-models")

# Classify an image from a URL
result = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")
print(result)

2️⃣ Using the Processor and Model Directly**
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import requests
import torch

# Load processor and model
processor = AutoImageProcessor.from_pretrained("rakib730/output-models")
model = AutoModelForImageClassification.from_pretrained("rakib730/output-models")

# Load an image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

# Preprocess
inputs = processor(images=image, return_tensors="pt")

# Inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class_id = logits.argmax(-1).item()

print("Predicted class:", model.config.id2label[predicted_class_id])
Downloads last month
48
Safetensors
Model size
303M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support