|
--- |
|
language: |
|
- ko |
|
- en |
|
license: mit |
|
metrics: |
|
- recall |
|
base_model: |
|
- google/siglip2-base-patch16-224 |
|
tags: |
|
- zero-shot-image-classification |
|
--- |
|
|
|
# silgip2-base-patch16-224-ko |
|
|
|
google/siglip2-base-patch16-224 λͺ¨λΈμ [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/abs/2004.09813)κΈ°λ°μΌλ‘ νμ΅ν΄μ νκ΅μ΄ μ΄ν΄λ₯λ ₯μ κ°νν Siglip2 λͺ¨λΈμ
λλ€. |
|
|
|
μ¬μ©λ νμ΅ λ°μ΄ν° : aihub english-korean parallel dataset |
|
|
|
μ¬μ©λ νκ° λ°μ΄ν° : ms-koko caption english korean dataset |
|
|
|
## How to use |
|
|
|
```python |
|
import requests |
|
import torch |
|
from PIL import Image |
|
from transformers import AutoModel, AutoProcessor |
|
|
|
repo = "hyunlord/siglip2-base-patch16-224-ko" |
|
model = AutoModel.from_pretrained(repo) |
|
processor = AutoProcessor.from_pretrained(repo) |
|
|
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg" |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
texts = ["κ³ μμ΄ ν λ§λ¦¬", |
|
"κ³ μμ΄ λ λ§λ¦¬", |
|
"λΆνμ μνμ λλ¬λμ΄ κ³ μμ΄ μΉκ΅¬λ€", |
|
"리λͺ¨μ»¨κ³Ό κ³ μμ΄ λλ§λ¦¬", |
|
"리λͺ¨μ»¨ λ κ°μ κ³ μμ΄ λλ§λ¦¬", |
|
"λΆνμ μν μμ 리λͺ¨μ»¨ λ κ°μ λλ¬λμ΄ κ³ μμ΄ λλ§λ¦¬"] |
|
inputs = processor(text=texts, |
|
images=image, |
|
padding="max_length", |
|
max_length=64, |
|
return_tensors="pt") |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
logits_per_image = outputs.logits_per_image |
|
probs = torch.sigmoid(logits_per_image) |
|
``` |
|
|
|
```python |
|
>>> probs |
|
tensor([[0.0038, 0.0429, 0.8294, 0.9787, 0.9816, 0.9990]]) |
|
``` |
|
|
|
## MS-COCO Caption Evaluation |
|
| Model | Parameter Size | (En) I-T Recall@1 | (En) T-I Recall@1 | (Ko) I-T Recall@1 | (Ko) T-I Recall@1 | |
|
|---|---|---|---|---|---| |
|
| google/siglip2-base-patch16-224 | 375,187,970 | 65.20% | 48.29% | 45.68% | 25.44% | |
|
| google/siglip2-so400m-patch14-384 | 1,136,008,498 | 67.74% | 52.04% | 52.36% | 31.59% | |
|
| hyunlord/siglip2-base-patch16-224-ko | 375,187,970 | 65.54% | 47.99% | 57.24% | 36.55% | |