itsJasminZWIN
/

chihiro-classifier

Image Classification

Model card Files Files and versions

Metrics Training metrics Community

chihiro-classifier / README.md

itsJasminZWIN's picture

Create README.md

4c82809 verified 4 months ago

|

history blame contribute delete

3.55 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: google/vit-base-patch16-224
	tags:
	- image-classification
	- chihiro
	- studio-ghibli
	- custom-dataset
	metrics:
	- accuracy
	- precision
	- recall
	model-index:
	- name: chihiro-classifier-vit
	results:
	- task:
	type: image-classification
	name: Image Classification
	dataset:
	name: Custom Ghibli Dataset
	type: imagefolder
	metrics:
	- name: Test Accuracy
	type: accuracy
	value: 0.9333
	- name: Zero-shot CLIP Accuracy
	type: accuracy
	value: 0.8667
	- name: Zero-shot Precision
	type: precision
	value: 0.8909
	- name: Zero-shot Recall
	type: recall
	value: 0.8667
	---

	<!-- This model card was customized based on training logs and evaluation metrics. -->

	# chihiro-classifier-vit

	This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) trained on a small, custom binary classification dataset consisting of images labeled either "chihiro" or "not chihiro" (from Studio Ghibli films).

	It was trained using PyTorch with transfer learning on a dataset of approximately 148 images.

	## Model description

	The model classifies images into one of two categories: Chihiro or Not Chihiro. It uses a Vision Transformer (ViT) backbone with a custom classification head for binary output. Data augmentation was used during training to improve generalization. Techniques included random horizontal flip, rotation (30°), color jitter, and random resized crop.

	## Intended uses & limitations

	Intended Uses:
	- Student computer vision project

	Limitations:
	- Small dataset may limit real-world performance
	- Not robust to domain shift or artistic variation
	- Not intended for production deployment

	## Training and evaluation data

	- Custom image dataset of Chihiro vs. non-Chihiro characters
	- Loaded using Hugging Face's `imagefolder` format
	- Split: 80% train, 10% validation, 10% test
	- Augmentation applied during training; deterministic preprocessing during eval

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam
	- num_epochs: 12

	### Training results

	\| Epoch \| Train Loss \| Train Acc \| Val Loss \| Val Acc \|
	\|:-----:\|:----------:\|:---------:\|:--------:\|:-------:\|
	\| 1 \| 0.8325 \| 58.47% \| 0.7285 \| 46.67% \|
	\| 2 \| 0.6038 \| 55.08% \| 0.6931 \| 60.00% \|
	\| 3 \| 0.6047 \| 67.80% \| 0.6170 \| 66.67% \|
	\| 4 \| 0.4854 \| 77.97% \| 0.7272 \| 66.67% \|
	\| 5 \| 0.3989 \| 79.66% \| 0.5494 \| 66.67% \|
	\| 6 \| 0.3091 \| 88.14% \| 0.4649 \| 86.67% \|
	\| 7 \| 0.2651 \| 88.98% \| 0.5736 \| 73.33% \|
	\| 8 \| 0.2043 \| 94.07% \| 0.5335 \| 73.33% \|
	\| 9 \| 0.2668 \| 87.29% \| 0.5765 \| 80.00% \|
	\| 10 \| 0.2408 \| 87.29% \| 0.5346 \| 73.33% \|
	\| 11 \| 0.1047 \| 95.76% \| 0.4125 \| 73.33% \|
	\| 12 \| 0.1297 \| 94.07% \| 0.4084 \| 86.67% \|


	### Final Test Evaluation

	- `Test Loss`: 0.3677
	- `Test Accuracy`: 0.7333


	## 🧪 Zero-Shot CLIP Comparison

	Evaluated using `openai/clip-vit-base-patch32` with no fine-tuning:

	- `Zero-shot Accuracy`: 86.67%
	- `Precision`: 0.8909
	- `Recall`: 0.8667

	## Framework versions

	- Transformers: not used (custom PyTorch)
	- PyTorch: 2.x
	- Datasets: 2.x
	- Tokenizers: N/A