File size: 2,093 Bytes
c0fa270 6db91b8 6b0f432 8dc797e c0fa270 08ba43e c0fa270 f6ef9c5 f416394 f6ef9c5 c0fa270 a65bb64 27ff348 41a1d2d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
language:
- ko
- en
license: mit
metrics:
- recall
base_model:
- google/siglip2-base-patch16-224
tags:
- zero-shot-image-classification
---
# silgip2-base-patch16-224-ko
google/siglip2-base-patch16-224 λͺ¨λΈμ [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/abs/2004.09813)κΈ°λ°μΌλ‘ νμ΅ν΄μ νκ΅μ΄ μ΄ν΄λ₯λ ₯μ κ°νν Siglip2 λͺ¨λΈμ
λλ€.
μ¬μ©λ νμ΅ λ°μ΄ν° : aihub english-korean parallel dataset
μ¬μ©λ νκ° λ°μ΄ν° : ms-koko caption english korean dataset
## How to use
```python
import requests
import torch
from PIL import Image
from transformers import AutoModel, AutoProcessor
repo = "hyunlord/siglip2-base-patch16-224-ko"
model = AutoModel.from_pretrained(repo)
processor = AutoProcessor.from_pretrained(repo)
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
texts = ["κ³ μμ΄ ν λ§λ¦¬",
"κ³ μμ΄ λ λ§λ¦¬",
"λΆνμ μνμ λλ¬λμ΄ κ³ μμ΄ μΉκ΅¬λ€",
"리λͺ¨μ»¨κ³Ό κ³ μμ΄ λλ§λ¦¬",
"리λͺ¨μ»¨ λ κ°μ κ³ μμ΄ λλ§λ¦¬",
"λΆνμ μν μμ 리λͺ¨μ»¨ λ κ°μ λλ¬λμ΄ κ³ μμ΄ λλ§λ¦¬"]
inputs = processor(text=texts,
images=image,
padding="max_length",
max_length=64,
return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image
probs = torch.sigmoid(logits_per_image)
```
```python
>>> probs
tensor([[0.0038, 0.0429, 0.8294, 0.9787, 0.9816, 0.9990]])
```
## MS-COCO Caption Evaluation
| Model | Parameter Size | (En) I-T Recall@1 | (En) T-I Recall@1 | (Ko) I-T Recall@1 | (Ko) T-I Recall@1 |
|---|---|---|---|---|---|
| google/siglip2-base-patch16-224 | 375,187,970 | 65.20% | 48.29% | 45.68% | 25.44% |
| google/siglip2-so400m-patch14-384 | 1,136,008,498 | 67.74% | 52.04% | 52.36% | 31.59% |
| hyunlord/siglip2-base-patch16-224-ko | 375,187,970 | 65.54% | 47.99% | 57.24% | 36.55% | |