itsJasminZWIN commited on
Commit
4c82809
·
verified ·
1 Parent(s): 8708abc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -0
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: google/vit-base-patch16-224
5
+ tags:
6
+ - image-classification
7
+ - chihiro
8
+ - studio-ghibli
9
+ - custom-dataset
10
+ metrics:
11
+ - accuracy
12
+ - precision
13
+ - recall
14
+ model-index:
15
+ - name: chihiro-classifier-vit
16
+ results:
17
+ - task:
18
+ type: image-classification
19
+ name: Image Classification
20
+ dataset:
21
+ name: Custom Ghibli Dataset
22
+ type: imagefolder
23
+ metrics:
24
+ - name: Test Accuracy
25
+ type: accuracy
26
+ value: 0.9333
27
+ - name: Zero-shot CLIP Accuracy
28
+ type: accuracy
29
+ value: 0.8667
30
+ - name: Zero-shot Precision
31
+ type: precision
32
+ value: 0.8909
33
+ - name: Zero-shot Recall
34
+ type: recall
35
+ value: 0.8667
36
+ ---
37
+
38
+ <!-- This model card was customized based on training logs and evaluation metrics. -->
39
+
40
+ # chihiro-classifier-vit
41
+
42
+ This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) trained on a small, custom binary classification dataset consisting of images labeled either "chihiro" or "not chihiro" (from Studio Ghibli films).
43
+
44
+ It was trained using PyTorch with transfer learning on a dataset of approximately 148 images.
45
+
46
+ ## Model description
47
+
48
+ The model classifies images into one of two categories: **Chihiro** or **Not Chihiro**. It uses a Vision Transformer (ViT) backbone with a custom classification head for binary output. Data augmentation was used during training to improve generalization. Techniques included random horizontal flip, rotation (30°), color jitter, and random resized crop.
49
+
50
+ ## Intended uses & limitations
51
+
52
+ **Intended Uses:**
53
+ - Student computer vision project
54
+
55
+ **Limitations:**
56
+ - Small dataset may limit real-world performance
57
+ - Not robust to domain shift or artistic variation
58
+ - Not intended for production deployment
59
+
60
+ ## Training and evaluation data
61
+
62
+ - Custom image dataset of Chihiro vs. non-Chihiro characters
63
+ - Loaded using Hugging Face's `imagefolder` format
64
+ - Split: 80% train, 10% validation, 10% test
65
+ - Augmentation applied during training; deterministic preprocessing during eval
66
+
67
+ ## Training procedure
68
+
69
+ ### Training hyperparameters
70
+
71
+ The following hyperparameters were used during training:
72
+ - learning_rate: 0.0001
73
+ - train_batch_size: 32
74
+ - eval_batch_size: 32
75
+ - seed: 42
76
+ - optimizer: Adam
77
+ - num_epochs: 12
78
+
79
+ ### Training results
80
+
81
+ | Epoch | Train Loss | Train Acc | Val Loss | Val Acc |
82
+ |:-----:|:----------:|:---------:|:--------:|:-------:|
83
+ | 1 | 0.8325 | 58.47% | 0.7285 | 46.67% |
84
+ | 2 | 0.6038 | 55.08% | 0.6931 | 60.00% |
85
+ | 3 | 0.6047 | 67.80% | 0.6170 | 66.67% |
86
+ | 4 | 0.4854 | 77.97% | 0.7272 | 66.67% |
87
+ | 5 | 0.3989 | 79.66% | 0.5494 | 66.67% |
88
+ | 6 | 0.3091 | 88.14% | 0.4649 | 86.67% |
89
+ | 7 | 0.2651 | 88.98% | 0.5736 | 73.33% |
90
+ | 8 | 0.2043 | 94.07% | 0.5335 | 73.33% |
91
+ | 9 | 0.2668 | 87.29% | 0.5765 | 80.00% |
92
+ | 10 | 0.2408 | 87.29% | 0.5346 | 73.33% |
93
+ | 11 | 0.1047 | 95.76% | 0.4125 | 73.33% |
94
+ | 12 | 0.1297 | 94.07% | 0.4084 | 86.67% |
95
+
96
+
97
+ ### Final Test Evaluation
98
+
99
+ - `Test Loss`: 0.3677
100
+ - `Test Accuracy`: 0.7333
101
+
102
+
103
+ ## 🧪 Zero-Shot CLIP Comparison
104
+
105
+ Evaluated using `openai/clip-vit-base-patch32` with no fine-tuning:
106
+
107
+ - `Zero-shot Accuracy`: 86.67%
108
+ - `Precision`: 0.8909
109
+ - `Recall`: 0.8667
110
+
111
+ ## Framework versions
112
+
113
+ - Transformers: not used (custom PyTorch)
114
+ - PyTorch: 2.x
115
+ - Datasets: 2.x
116
+ - Tokenizers: N/A