chihiro-classifier / README.md
itsJasminZWIN's picture
Create README.md
4c82809 verified
metadata
library_name: transformers
license: apache-2.0
base_model: google/vit-base-patch16-224
tags:
  - image-classification
  - chihiro
  - studio-ghibli
  - custom-dataset
metrics:
  - accuracy
  - precision
  - recall
model-index:
  - name: chihiro-classifier-vit
    results:
      - task:
          type: image-classification
          name: Image Classification
        dataset:
          name: Custom Ghibli Dataset
          type: imagefolder
        metrics:
          - name: Test Accuracy
            type: accuracy
            value: 0.9333
          - name: Zero-shot CLIP Accuracy
            type: accuracy
            value: 0.8667
          - name: Zero-shot Precision
            type: precision
            value: 0.8909
          - name: Zero-shot Recall
            type: recall
            value: 0.8667

chihiro-classifier-vit

This model is a fine-tuned version of google/vit-base-patch16-224 trained on a small, custom binary classification dataset consisting of images labeled either "chihiro" or "not chihiro" (from Studio Ghibli films).

It was trained using PyTorch with transfer learning on a dataset of approximately 148 images.

Model description

The model classifies images into one of two categories: Chihiro or Not Chihiro. It uses a Vision Transformer (ViT) backbone with a custom classification head for binary output. Data augmentation was used during training to improve generalization. Techniques included random horizontal flip, rotation (30°), color jitter, and random resized crop.

Intended uses & limitations

Intended Uses:

  • Student computer vision project

Limitations:

  • Small dataset may limit real-world performance
  • Not robust to domain shift or artistic variation
  • Not intended for production deployment

Training and evaluation data

  • Custom image dataset of Chihiro vs. non-Chihiro characters
  • Loaded using Hugging Face's imagefolder format
  • Split: 80% train, 10% validation, 10% test
  • Augmentation applied during training; deterministic preprocessing during eval

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam
  • num_epochs: 12

Training results

Epoch Train Loss Train Acc Val Loss Val Acc
1 0.8325 58.47% 0.7285 46.67%
2 0.6038 55.08% 0.6931 60.00%
3 0.6047 67.80% 0.6170 66.67%
4 0.4854 77.97% 0.7272 66.67%
5 0.3989 79.66% 0.5494 66.67%
6 0.3091 88.14% 0.4649 86.67%
7 0.2651 88.98% 0.5736 73.33%
8 0.2043 94.07% 0.5335 73.33%
9 0.2668 87.29% 0.5765 80.00%
10 0.2408 87.29% 0.5346 73.33%
11 0.1047 95.76% 0.4125 73.33%
12 0.1297 94.07% 0.4084 86.67%

Final Test Evaluation

  • Test Loss: 0.3677
  • Test Accuracy: 0.7333

🧪 Zero-Shot CLIP Comparison

Evaluated using openai/clip-vit-base-patch32 with no fine-tuning:

  • Zero-shot Accuracy: 86.67%
  • Precision: 0.8909
  • Recall: 0.8667

Framework versions

  • Transformers: not used (custom PyTorch)
  • PyTorch: 2.x
  • Datasets: 2.x
  • Tokenizers: N/A