File size: 3,466 Bytes
99dbce1 f06cc95 70c2fa0 f06cc95 70c2fa0 f06cc95 82afec0 f06cc95 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
language: en
license: apache-2.0
tags:
- vision
- image-classification
- vit
- fine-tuned
- transformers
datasets:
- your-dataset-name
model-index:
- name: ViT-Large-Patch16-224 Fine-tuned Model
results:
- task:
name: Image Classification
type: image-classification
metrics:
- name: Validation Loss
type: loss
value: 0.3268
---
# Vision Transformer (ViT) Fine-Tuned Model
# Vision Transformer (ViT) Fine-Tuned Model
This repository contains a fine-tuned version of **[google/vit-large-patch16-224](https://huggingface.co/google/vit-large-patch16-224)**, optimized for a custom image classification task.
---
## π Model Overview
- **Base model**: `google/vit-large-patch16-224`
- **Architecture**: Vision Transformer (ViT)
- **Patch size**: 16Γ16
- **Image resolution**: 224Γ224
- **Frameworks**: PyTorch, Hugging Face Transformers
---
## π Performance
| Metric | Value |
|--------|-------|
| **Final Validation Loss** | **0.3268** |
| **Lowest Validation Loss** | **0.2548** (Epoch 18) |
Training loss and validation loss trends indicate good convergence with slight overfitting after ~30 epochs.
---
## π§ Training Configuration
| Hyperparameter | Value |
|----------------|-------|
| **Learning rate** | `2e-5` |
| **Train batch size** | `20` |
| **Eval batch size** | `8` |
| **Optimizer** | AdamW (`betas=(0.9, 0.999)`, `eps=1e-8`) |
| **LR scheduler** | Linear |
| **Epochs** | `40` |
| **Seed** | `42` |
| **Framework versions** | Transformers 4.52.4, PyTorch 2.6.0+cu124, Datasets 3.6.0, Tokenizers 0.21.2 |
---
## π Training Results
| Epoch | Step | Validation Loss |
|-------|------|-----------------|
| 1 | 24 | 0.5601 |
| 5 | 120 | 0.3421 |
| 10 | 240 | 0.2901 |
| 14 | 336 | 0.2737 |
| 18 | 432 | **0.2548** |
| 40 | 960 | 0.3268 |
---
## π Intended Uses
- Image classification on datasets with characteristics similar to the training dataset.
- Fine-tuning for domain-specific classification tasks.
---
## β Limitations
- Trained on a **custom dataset** β may not generalize well to unrelated domains without additional fine-tuning.
- No guarantees on fairness, bias, or ethical implications without dataset analysis.
---
## π How to Use
You can use this model in two main ways:
### **1οΈβ£ Using the High-Level `pipeline` API**
```python
from transformers import pipeline
pipe = pipeline("image-classification", model="rakib730/output-models")
# Classify an image from a URL
result = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")
print(result)
2οΈβ£ Using the Processor and Model Directly**
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import requests
import torch
# Load processor and model
processor = AutoImageProcessor.from_pretrained("rakib730/output-models")
model = AutoModelForImageClassification.from_pretrained("rakib730/output-models")
# Load an image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
# Preprocess
inputs = processor(images=image, return_tensors="pt")
# Inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class_id = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_id])
|