itsJasminZWIN
/

chihiro-classifier

+---
+library_name: transformers
+license: apache-2.0
+base_model: google/vit-base-patch16-224
+tags:
+- image-classification
+- chihiro
+- studio-ghibli
+- custom-dataset
+metrics:
+- accuracy
+- precision
+- recall
+model-index:
+- name: chihiro-classifier-vit
+  results:
+    - task:
+        type: image-classification
+        name: Image Classification
+      dataset:
+        name: Custom Ghibli Dataset
+        type: imagefolder
+      metrics:
+        - name: Test Accuracy
+          type: accuracy
+          value: 0.9333
+        - name: Zero-shot CLIP Accuracy
+          type: accuracy
+          value: 0.8667
+        - name: Zero-shot Precision
+          type: precision
+          value: 0.8909
+        - name: Zero-shot Recall
+          type: recall
+          value: 0.8667
+---
+<!-- This model card was customized based on training logs and evaluation metrics. -->
+# chihiro-classifier-vit
+This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) trained on a small, custom binary classification dataset consisting of images labeled either "chihiro" or "not chihiro" (from Studio Ghibli films).
+It was trained using PyTorch with transfer learning on a dataset of approximately 148 images.
+## Model description
+The model classifies images into one of two categories: **Chihiro** or **Not Chihiro**. It uses a Vision Transformer (ViT) backbone with a custom classification head for binary output. Data augmentation was used during training to improve generalization. Techniques included random horizontal flip, rotation (30°), color jitter, and random resized crop.
+## Intended uses & limitations
+**Intended Uses:**
+- Student computer vision project
+**Limitations:**
+- Small dataset may limit real-world performance
+- Not robust to domain shift or artistic variation
+- Not intended for production deployment
+## Training and evaluation data
+- Custom image dataset of Chihiro vs. non-Chihiro characters
+- Loaded using Hugging Face's `imagefolder` format
+- Split: 80% train, 10% validation, 10% test
+- Augmentation applied during training; deterministic preprocessing during eval
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 42
+- optimizer: Adam
+- num_epochs: 12
+### Training results
+| Epoch | Train Loss | Train Acc | Val Loss | Val Acc |
+|:-----:|:----------:|:---------:|:--------:|:-------:|
+| 1     | 0.8325     | 58.47%    | 0.7285   | 46.67%  |
+| 2     | 0.6038     | 55.08%    | 0.6931   | 60.00%  |
+| 3     | 0.6047     | 67.80%    | 0.6170   | 66.67%  |
+| 4     | 0.4854     | 77.97%    | 0.7272   | 66.67%  |
+| 5     | 0.3989     | 79.66%    | 0.5494   | 66.67%  |
+| 6     | 0.3091     | 88.14%    | 0.4649   | 86.67%  |
+| 7     | 0.2651     | 88.98%    | 0.5736   | 73.33%  |
+| 8     | 0.2043     | 94.07%    | 0.5335   | 73.33%  |
+| 9     | 0.2668     | 87.29%    | 0.5765   | 80.00%  |
+| 10    | 0.2408     | 87.29%    | 0.5346   | 73.33%  |
+| 11    | 0.1047     | 95.76%    | 0.4125   | 73.33%  |
+| 12    | 0.1297     | 94.07%    | 0.4084   | 86.67%  |
+### Final Test Evaluation
+- `Test Loss`: 0.3677
+- `Test Accuracy`: 0.7333
+## 🧪 Zero-Shot CLIP Comparison
+Evaluated using `openai/clip-vit-base-patch32` with no fine-tuning:
+- `Zero-shot Accuracy`: 86.67%
+- `Precision`: 0.8909
+- `Recall`: 0.8667
+## Framework versions
+- Transformers: not used (custom PyTorch)
+- PyTorch: 2.x
+- Datasets: 2.x
+- Tokenizers: N/A