Vision Transformer (ViT) Fine-Tuned Model
Vision Transformer (ViT) Fine-Tuned Model
This repository contains a fine-tuned version of google/vit-large-patch16-224, optimized for a custom image classification task.
π Model Overview
- Base model:
google/vit-large-patch16-224
- Architecture: Vision Transformer (ViT)
- Patch size: 16Γ16
- Image resolution: 224Γ224
- Frameworks: PyTorch, Hugging Face Transformers
π Performance
Metric | Value |
---|---|
Final Validation Loss | 0.3268 |
Lowest Validation Loss | 0.2548 (Epoch 18) |
Training loss and validation loss trends indicate good convergence with slight overfitting after ~30 epochs.
π§ Training Configuration
Hyperparameter | Value |
---|---|
Learning rate | 2e-5 |
Train batch size | 20 |
Eval batch size | 8 |
Optimizer | AdamW (betas=(0.9, 0.999) , eps=1e-8 ) |
LR scheduler | Linear |
Epochs | 40 |
Seed | 42 |
Framework versions | Transformers 4.52.4, PyTorch 2.6.0+cu124, Datasets 3.6.0, Tokenizers 0.21.2 |
π Training Results
Epoch | Step | Validation Loss |
---|---|---|
1 | 24 | 0.5601 |
5 | 120 | 0.3421 |
10 | 240 | 0.2901 |
14 | 336 | 0.2737 |
18 | 432 | 0.2548 |
40 | 960 | 0.3268 |
π Intended Uses
- Image classification on datasets with characteristics similar to the training dataset.
- Fine-tuning for domain-specific classification tasks.
β Limitations
- Trained on a custom dataset β may not generalize well to unrelated domains without additional fine-tuning.
- No guarantees on fairness, bias, or ethical implications without dataset analysis.
π How to Use
You can use this model in two main ways:
1οΈβ£ Using the High-Level pipeline
API
from transformers import pipeline
pipe = pipeline("image-classification", model="rakib730/output-models")
# Classify an image from a URL
result = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")
print(result)
2οΈβ£ Using the Processor and Model Directly**
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import requests
import torch
# Load processor and model
processor = AutoImageProcessor.from_pretrained("rakib730/output-models")
model = AutoModelForImageClassification.from_pretrained("rakib730/output-models")
# Load an image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
# Preprocess
inputs = processor(images=image, return_tensors="pt")
# Inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class_id = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_id])
- Downloads last month
- 48
Evaluation results
- Validation Lossself-reported0.327