Shio-Koube
/

Anime-girl-viewer-9000

Model card Files Files and versions Community

Anime-girl-viewer-9000 / README.md

Shio-Koube's picture

Create README.md

57bfae5 verified 3 months ago

|

history blame contribute delete

494 Bytes

	---
	license: mit
	---
	This is a MAE trained on Anime dataset. The main goal is to have a model efficient for image search, retrival and clustering.

	There are 2 parts of this model, the encoder and decoder. The encoder encode the full images into 8x512 embedding and the masked out image into 8 (28x28/10) x 512 embedding. The decoder try to reconstruct that image.

	Model arch is LocalViT small but with 16 layers instead of 12, Decoder is a simple transformers model with LocalViT style MLP.