hyunlord
/

siglip2-base-patch16-224-ko

Zero-Shot Image Classification

Model card Files Files and versions

siglip2-base-patch16-224-ko / README.md

hyunlord's picture

Update README.md

f6ef9c5 verified 3 months ago

|

history blame contribute delete

2.09 kB

	---
	language:
	- ko
	- en
	license: mit
	metrics:
	- recall
	base_model:
	- google/siglip2-base-patch16-224
	tags:
	- zero-shot-image-classification
	---

	# silgip2-base-patch16-224-ko

	google/siglip2-base-patch16-224 모델을 [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/abs/2004.09813)기반으로 학습해서 한국어 이해능력을 강화한 Siglip2 모델입니다.

	사용된 학습 데이터 : aihub english-korean parallel dataset

	사용된 평가 데이터 : ms-koko caption english korean dataset

	## How to use

	```python
	import requests
	import torch
	from PIL import Image
	from transformers import AutoModel, AutoProcessor

	repo = "hyunlord/siglip2-base-patch16-224-ko"
	model = AutoModel.from_pretrained(repo)
	processor = AutoProcessor.from_pretrained(repo)

	url = "http://images.cocodataset.org/val2017/000000039769.jpg"
	image = Image.open(requests.get(url, stream=True).raw)

	texts = ["고양이 한 마리",
	"고양이 두 마리",
	"분홍색 소파에 드러누운 고양이 친구들",
	"리모컨과 고양이 두마리",
	"리모컨 두 개와 고양이 두마리",
	"분홍색 소파 위에 리모컨 두 개와 드러누운 고양이 두마리"]
	inputs = processor(text=texts,
	images=image,
	padding="max_length",
	max_length=64,
	return_tensors="pt")
	with torch.no_grad():
	outputs = model(**inputs)
	logits_per_image = outputs.logits_per_image
	probs = torch.sigmoid(logits_per_image)
	```

	```python
	>>> probs
	tensor([[0.0038, 0.0429, 0.8294, 0.9787, 0.9816, 0.9990]])
	```

	## MS-COCO Caption Evaluation
	\| Model \| Parameter Size \| (En) I-T Recall@1 \| (En) T-I Recall@1 \| (Ko) I-T Recall@1 \| (Ko) T-I Recall@1 \|
	\|---\|---\|---\|---\|---\|---\|
	\| google/siglip2-base-patch16-224 \| 375,187,970 \| 65.20% \| 48.29% \| 45.68% \| 25.44% \|
	\| google/siglip2-so400m-patch14-384 \| 1,136,008,498 \| 67.74% \| 52.04% \| 52.36% \| 31.59% \|
	\| hyunlord/siglip2-base-patch16-224-ko \| 375,187,970 \| 65.54% \| 47.99% \| 57.24% \| 36.55% \|