nobodyPerfecZ
/

vit-finetuned-patch16-224-recaptchav2-v1

@@ -5,17 +5,17 @@ tags:
   - image classification
   - recaptchav2
 datasets:
-  - recaptchav2-dataset
 ---
 # Finetuned Vision Transformer
-This repository contains a Vision Transformer (ViT) model fine-tuned on the ReCAPTCHAv2 dataset.
 The dataset comprises 29,568 labeled images spanning 5 classes, each resized to a resolution of 224×224 pixels.
 ## Model description
-This model builds on a pre-trained ViT backbone and is fine-tuned on the ReCAPTCHAv2 dataset.
 It leverages the transformer-based architecture to capture global contextual information effectively, making it well-suited for tasks with diverse visual patterns like ReCAPTCHA classification.
 ## Intended uses & limitations
@@ -31,7 +31,7 @@ The model is particularly useful in academic and experimental contexts where und
 ## How to use
-Here is how to use this model to classify an image of the ReCAPTCHAv2 dataset into one of the 5 classes:
 ```python
 import requests
@@ -39,7 +39,7 @@ import torch
 from PIL import Image
 from transformers import ViTForImageClassification, ViTImageProcessor
-url = "https://raw.githubusercontent.com/nobodyPerfecZ/recaptchav2-dataset/refs/heads/master/data/bicycle/bicycle_0.png"
 image = Image.open(requests.get(url, stream=True).raw)
 processor = ViTImageProcessor.from_pretrained(
     "nobodyPerfecZ/vit-finetuned-patch16-224-recaptchav2-v1"
@@ -59,7 +59,7 @@ print(f"Predicted labels: {labels}")
 ## Training data
-The ViT model was fine-tuned on [ReCAPTCHAv2 dataset](https://huggingface.co/datasets/nobodyPerfecZ/recaptchav2-dataset), a dataset consisting of 29.568 images and 5 classes.
 ## Training procedure
@@ -71,7 +71,7 @@ Images are resized/rescaled to the same resolution (224x224) and normalized acro
 ## Evaluation results
-The ViT model was evaluated on a held-out test set from the ReCAPTCHAv2 dataset.
 Two key metrics were used to assess performance:
 | Metric           | Score |

   - image classification
   - recaptchav2
 datasets:
+  - recaptchav2-29k
 ---
 # Finetuned Vision Transformer
+This repository contains a Vision Transformer (ViT) model fine-tuned on the ReCAPTCHAv2-29k dataset.
 The dataset comprises 29,568 labeled images spanning 5 classes, each resized to a resolution of 224×224 pixels.
 ## Model description
+This model builds on a pre-trained ViT backbone and is fine-tuned on the ReCAPTCHAv2-29k dataset.
 It leverages the transformer-based architecture to capture global contextual information effectively, making it well-suited for tasks with diverse visual patterns like ReCAPTCHA classification.
 ## Intended uses & limitations
 ## How to use
+Here is how to use this model to classify an image of the ReCAPTCHAv2-29k dataset into one of the 5 classes:
 ```python
 import requests
 from PIL import Image
 from transformers import ViTForImageClassification, ViTImageProcessor
+url = "https://raw.githubusercontent.com/nobodyPerfecZ/recaptchav2-29k/refs/heads/master/data/bicycle/bicycle_0.png"
 image = Image.open(requests.get(url, stream=True).raw)
 processor = ViTImageProcessor.from_pretrained(
     "nobodyPerfecZ/vit-finetuned-patch16-224-recaptchav2-v1"
 ## Training data
+The ViT model was fine-tuned on [ReCAPTCHAv2-29k dataset](https://huggingface.co/datasets/nobodyPerfecZ/recaptchav2-29k), a dataset consisting of 29.568 images and 5 classes.
 ## Training procedure
 ## Evaluation results
+The ViT model was evaluated on a held-out test set from the ReCAPTCHAv2-29k dataset.
 Two key metrics were used to assess performance:
 | Metric           | Score |