|
--- |
|
language: en |
|
license: apache-2.0 |
|
tags: |
|
- vision |
|
- image-classification |
|
- vit |
|
- fine-tuned |
|
- transformers |
|
datasets: |
|
- your-dataset-name |
|
model-index: |
|
- name: ViT-Large-Patch16-224 Fine-tuned Model |
|
results: |
|
- task: |
|
name: Image Classification |
|
type: image-classification |
|
metrics: |
|
- name: Validation Loss |
|
type: loss |
|
value: 0.3268 |
|
--- |
|
|
|
# Vision Transformer (ViT) Fine-Tuned Model |
|
|
|
|
|
# Vision Transformer (ViT) Fine-Tuned Model |
|
|
|
This repository contains a fine-tuned version of **[google/vit-large-patch16-224](https://huggingface.co/google/vit-large-patch16-224)**, optimized for a custom image classification task. |
|
|
|
--- |
|
|
|
## π Model Overview |
|
|
|
- **Base model**: `google/vit-large-patch16-224` |
|
- **Architecture**: Vision Transformer (ViT) |
|
- **Patch size**: 16Γ16 |
|
- **Image resolution**: 224Γ224 |
|
- **Frameworks**: PyTorch, Hugging Face Transformers |
|
|
|
--- |
|
|
|
## π Performance |
|
|
|
| Metric | Value | |
|
|--------|-------| |
|
| **Final Validation Loss** | **0.3268** | |
|
| **Lowest Validation Loss** | **0.2548** (Epoch 18) | |
|
|
|
Training loss and validation loss trends indicate good convergence with slight overfitting after ~30 epochs. |
|
|
|
--- |
|
|
|
## π§ Training Configuration |
|
|
|
| Hyperparameter | Value | |
|
|----------------|-------| |
|
| **Learning rate** | `2e-5` | |
|
| **Train batch size** | `20` | |
|
| **Eval batch size** | `8` | |
|
| **Optimizer** | AdamW (`betas=(0.9, 0.999)`, `eps=1e-8`) | |
|
| **LR scheduler** | Linear | |
|
| **Epochs** | `40` | |
|
| **Seed** | `42` | |
|
| **Framework versions** | Transformers 4.52.4, PyTorch 2.6.0+cu124, Datasets 3.6.0, Tokenizers 0.21.2 | |
|
|
|
--- |
|
|
|
## π Training Results |
|
|
|
| Epoch | Step | Validation Loss | |
|
|-------|------|-----------------| |
|
| 1 | 24 | 0.5601 | |
|
| 5 | 120 | 0.3421 | |
|
| 10 | 240 | 0.2901 | |
|
| 14 | 336 | 0.2737 | |
|
| 18 | 432 | **0.2548** | |
|
| 40 | 960 | 0.3268 | |
|
|
|
--- |
|
|
|
## π Intended Uses |
|
|
|
- Image classification on datasets with characteristics similar to the training dataset. |
|
- Fine-tuning for domain-specific classification tasks. |
|
|
|
--- |
|
|
|
## β Limitations |
|
|
|
- Trained on a **custom dataset** β may not generalize well to unrelated domains without additional fine-tuning. |
|
- No guarantees on fairness, bias, or ethical implications without dataset analysis. |
|
|
|
--- |
|
|
|
## π How to Use |
|
|
|
You can use this model in two main ways: |
|
|
|
### **1οΈβ£ Using the High-Level `pipeline` API** |
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("image-classification", model="rakib730/output-models") |
|
|
|
# Classify an image from a URL |
|
result = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png") |
|
print(result) |
|
|
|
2οΈβ£ Using the Processor and Model Directly** |
|
from transformers import AutoImageProcessor, AutoModelForImageClassification |
|
from PIL import Image |
|
import requests |
|
import torch |
|
|
|
# Load processor and model |
|
processor = AutoImageProcessor.from_pretrained("rakib730/output-models") |
|
model = AutoModelForImageClassification.from_pretrained("rakib730/output-models") |
|
|
|
# Load an image |
|
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png" |
|
image = Image.open(requests.get(url, stream=True).raw).convert("RGB") |
|
|
|
# Preprocess |
|
inputs = processor(images=image, return_tensors="pt") |
|
|
|
# Inference |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
logits = outputs.logits |
|
predicted_class_id = logits.argmax(-1).item() |
|
|
|
print("Predicted class:", model.config.id2label[predicted_class_id]) |
|
|