output-models / README.md
rakib730's picture
Update README.md
99dbce1 verified
metadata
language: en
license: apache-2.0
tags:
  - vision
  - image-classification
  - vit
  - fine-tuned
  - transformers
datasets:
  - your-dataset-name
model-index:
  - name: ViT-Large-Patch16-224 Fine-tuned Model
    results:
      - task:
          name: Image Classification
          type: image-classification
        metrics:
          - name: Validation Loss
            type: loss
            value: 0.3268

Vision Transformer (ViT) Fine-Tuned Model

Vision Transformer (ViT) Fine-Tuned Model

This repository contains a fine-tuned version of google/vit-large-patch16-224, optimized for a custom image classification task.


πŸ“Œ Model Overview

  • Base model: google/vit-large-patch16-224
  • Architecture: Vision Transformer (ViT)
  • Patch size: 16Γ—16
  • Image resolution: 224Γ—224
  • Frameworks: PyTorch, Hugging Face Transformers

πŸ“Š Performance

Metric Value
Final Validation Loss 0.3268
Lowest Validation Loss 0.2548 (Epoch 18)

Training loss and validation loss trends indicate good convergence with slight overfitting after ~30 epochs.


πŸ”§ Training Configuration

Hyperparameter Value
Learning rate 2e-5
Train batch size 20
Eval batch size 8
Optimizer AdamW (betas=(0.9, 0.999), eps=1e-8)
LR scheduler Linear
Epochs 40
Seed 42
Framework versions Transformers 4.52.4, PyTorch 2.6.0+cu124, Datasets 3.6.0, Tokenizers 0.21.2

πŸ“‚ Training Results

Epoch Step Validation Loss
1 24 0.5601
5 120 0.3421
10 240 0.2901
14 336 0.2737
18 432 0.2548
40 960 0.3268

πŸ›  Intended Uses

  • Image classification on datasets with characteristics similar to the training dataset.
  • Fine-tuning for domain-specific classification tasks.

⚠ Limitations

  • Trained on a custom dataset β€” may not generalize well to unrelated domains without additional fine-tuning.
  • No guarantees on fairness, bias, or ethical implications without dataset analysis.

πŸš€ How to Use

You can use this model in two main ways:

1️⃣ Using the High-Level pipeline API

from transformers import pipeline

pipe = pipeline("image-classification", model="rakib730/output-models")

# Classify an image from a URL
result = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")
print(result)

2️⃣ Using the Processor and Model Directly**
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import requests
import torch

# Load processor and model
processor = AutoImageProcessor.from_pretrained("rakib730/output-models")
model = AutoModelForImageClassification.from_pretrained("rakib730/output-models")

# Load an image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

# Preprocess
inputs = processor(images=image, return_tensors="pt")

# Inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class_id = logits.argmax(-1).item()

print("Predicted class:", model.config.id2label[predicted_class_id])