chihiro-classifier / README.md
itsJasminZWIN's picture
Create README.md
4c82809 verified
---
library_name: transformers
license: apache-2.0
base_model: google/vit-base-patch16-224
tags:
- image-classification
- chihiro
- studio-ghibli
- custom-dataset
metrics:
- accuracy
- precision
- recall
model-index:
- name: chihiro-classifier-vit
results:
- task:
type: image-classification
name: Image Classification
dataset:
name: Custom Ghibli Dataset
type: imagefolder
metrics:
- name: Test Accuracy
type: accuracy
value: 0.9333
- name: Zero-shot CLIP Accuracy
type: accuracy
value: 0.8667
- name: Zero-shot Precision
type: precision
value: 0.8909
- name: Zero-shot Recall
type: recall
value: 0.8667
---
<!-- This model card was customized based on training logs and evaluation metrics. -->
# chihiro-classifier-vit
This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) trained on a small, custom binary classification dataset consisting of images labeled either "chihiro" or "not chihiro" (from Studio Ghibli films).
It was trained using PyTorch with transfer learning on a dataset of approximately 148 images.
## Model description
The model classifies images into one of two categories: **Chihiro** or **Not Chihiro**. It uses a Vision Transformer (ViT) backbone with a custom classification head for binary output. Data augmentation was used during training to improve generalization. Techniques included random horizontal flip, rotation (30°), color jitter, and random resized crop.
## Intended uses & limitations
**Intended Uses:**
- Student computer vision project
**Limitations:**
- Small dataset may limit real-world performance
- Not robust to domain shift or artistic variation
- Not intended for production deployment
## Training and evaluation data
- Custom image dataset of Chihiro vs. non-Chihiro characters
- Loaded using Hugging Face's `imagefolder` format
- Split: 80% train, 10% validation, 10% test
- Augmentation applied during training; deterministic preprocessing during eval
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam
- num_epochs: 12
### Training results
| Epoch | Train Loss | Train Acc | Val Loss | Val Acc |
|:-----:|:----------:|:---------:|:--------:|:-------:|
| 1 | 0.8325 | 58.47% | 0.7285 | 46.67% |
| 2 | 0.6038 | 55.08% | 0.6931 | 60.00% |
| 3 | 0.6047 | 67.80% | 0.6170 | 66.67% |
| 4 | 0.4854 | 77.97% | 0.7272 | 66.67% |
| 5 | 0.3989 | 79.66% | 0.5494 | 66.67% |
| 6 | 0.3091 | 88.14% | 0.4649 | 86.67% |
| 7 | 0.2651 | 88.98% | 0.5736 | 73.33% |
| 8 | 0.2043 | 94.07% | 0.5335 | 73.33% |
| 9 | 0.2668 | 87.29% | 0.5765 | 80.00% |
| 10 | 0.2408 | 87.29% | 0.5346 | 73.33% |
| 11 | 0.1047 | 95.76% | 0.4125 | 73.33% |
| 12 | 0.1297 | 94.07% | 0.4084 | 86.67% |
### Final Test Evaluation
- `Test Loss`: 0.3677
- `Test Accuracy`: 0.7333
## 🧪 Zero-Shot CLIP Comparison
Evaluated using `openai/clip-vit-base-patch32` with no fine-tuning:
- `Zero-shot Accuracy`: 86.67%
- `Precision`: 0.8909
- `Recall`: 0.8667
## Framework versions
- Transformers: not used (custom PyTorch)
- PyTorch: 2.x
- Datasets: 2.x
- Tokenizers: N/A