Commit
·
d912803
1
Parent(s):
e6eba41
D. J.:
Browse files- Updated README.md
README.md
CHANGED
@@ -5,17 +5,17 @@ tags:
|
|
5 |
- image classification
|
6 |
- recaptchav2
|
7 |
datasets:
|
8 |
-
- recaptchav2-
|
9 |
---
|
10 |
|
11 |
# Finetuned Vision Transformer
|
12 |
|
13 |
-
This repository contains a Vision Transformer (ViT) model fine-tuned on the ReCAPTCHAv2 dataset.
|
14 |
The dataset comprises 29,568 labeled images spanning 5 classes, each resized to a resolution of 224×224 pixels.
|
15 |
|
16 |
## Model description
|
17 |
|
18 |
-
This model builds on a pre-trained ViT backbone and is fine-tuned on the ReCAPTCHAv2 dataset.
|
19 |
It leverages the transformer-based architecture to capture global contextual information effectively, making it well-suited for tasks with diverse visual patterns like ReCAPTCHA classification.
|
20 |
|
21 |
## Intended uses & limitations
|
@@ -31,7 +31,7 @@ The model is particularly useful in academic and experimental contexts where und
|
|
31 |
|
32 |
## How to use
|
33 |
|
34 |
-
Here is how to use this model to classify an image of the ReCAPTCHAv2 dataset into one of the 5 classes:
|
35 |
|
36 |
```python
|
37 |
import requests
|
@@ -39,7 +39,7 @@ import torch
|
|
39 |
from PIL import Image
|
40 |
from transformers import ViTForImageClassification, ViTImageProcessor
|
41 |
|
42 |
-
url = "https://raw.githubusercontent.com/nobodyPerfecZ/recaptchav2-
|
43 |
image = Image.open(requests.get(url, stream=True).raw)
|
44 |
processor = ViTImageProcessor.from_pretrained(
|
45 |
"nobodyPerfecZ/vit-finetuned-patch16-224-recaptchav2-v1"
|
@@ -59,7 +59,7 @@ print(f"Predicted labels: {labels}")
|
|
59 |
|
60 |
## Training data
|
61 |
|
62 |
-
The ViT model was fine-tuned on [ReCAPTCHAv2 dataset](https://huggingface.co/datasets/nobodyPerfecZ/recaptchav2-
|
63 |
|
64 |
## Training procedure
|
65 |
|
@@ -71,7 +71,7 @@ Images are resized/rescaled to the same resolution (224x224) and normalized acro
|
|
71 |
|
72 |
## Evaluation results
|
73 |
|
74 |
-
The ViT model was evaluated on a held-out test set from the ReCAPTCHAv2 dataset.
|
75 |
Two key metrics were used to assess performance:
|
76 |
|
77 |
| Metric | Score |
|
|
|
5 |
- image classification
|
6 |
- recaptchav2
|
7 |
datasets:
|
8 |
+
- recaptchav2-29k
|
9 |
---
|
10 |
|
11 |
# Finetuned Vision Transformer
|
12 |
|
13 |
+
This repository contains a Vision Transformer (ViT) model fine-tuned on the ReCAPTCHAv2-29k dataset.
|
14 |
The dataset comprises 29,568 labeled images spanning 5 classes, each resized to a resolution of 224×224 pixels.
|
15 |
|
16 |
## Model description
|
17 |
|
18 |
+
This model builds on a pre-trained ViT backbone and is fine-tuned on the ReCAPTCHAv2-29k dataset.
|
19 |
It leverages the transformer-based architecture to capture global contextual information effectively, making it well-suited for tasks with diverse visual patterns like ReCAPTCHA classification.
|
20 |
|
21 |
## Intended uses & limitations
|
|
|
31 |
|
32 |
## How to use
|
33 |
|
34 |
+
Here is how to use this model to classify an image of the ReCAPTCHAv2-29k dataset into one of the 5 classes:
|
35 |
|
36 |
```python
|
37 |
import requests
|
|
|
39 |
from PIL import Image
|
40 |
from transformers import ViTForImageClassification, ViTImageProcessor
|
41 |
|
42 |
+
url = "https://raw.githubusercontent.com/nobodyPerfecZ/recaptchav2-29k/refs/heads/master/data/bicycle/bicycle_0.png"
|
43 |
image = Image.open(requests.get(url, stream=True).raw)
|
44 |
processor = ViTImageProcessor.from_pretrained(
|
45 |
"nobodyPerfecZ/vit-finetuned-patch16-224-recaptchav2-v1"
|
|
|
59 |
|
60 |
## Training data
|
61 |
|
62 |
+
The ViT model was fine-tuned on [ReCAPTCHAv2-29k dataset](https://huggingface.co/datasets/nobodyPerfecZ/recaptchav2-29k), a dataset consisting of 29.568 images and 5 classes.
|
63 |
|
64 |
## Training procedure
|
65 |
|
|
|
71 |
|
72 |
## Evaluation results
|
73 |
|
74 |
+
The ViT model was evaluated on a held-out test set from the ReCAPTCHAv2-29k dataset.
|
75 |
Two key metrics were used to assess performance:
|
76 |
|
77 |
| Metric | Score |
|