File size: 2,093 Bytes
c0fa270
 
 
 
6db91b8
6b0f432
 
 
 
8dc797e
 
c0fa270
 
 
 
08ba43e
c0fa270
f6ef9c5
f416394
f6ef9c5
c0fa270
 
a65bb64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27ff348
 
 
 
 
 
 
41a1d2d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
language:
- ko
- en
license: mit
metrics:
- recall
base_model:
- google/siglip2-base-patch16-224
tags:
- zero-shot-image-classification
---

# silgip2-base-patch16-224-ko

google/siglip2-base-patch16-224 λͺ¨λΈμ„ [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/abs/2004.09813)기반으둜 ν•™μŠ΅ν•΄μ„œ ν•œκ΅­μ–΄ 이해λŠ₯λ ₯을 κ°•ν™”ν•œ Siglip2 λͺ¨λΈμž…λ‹ˆλ‹€.

μ‚¬μš©λœ ν•™μŠ΅ 데이터 : aihub english-korean parallel dataset

μ‚¬μš©λœ 평가 데이터 : ms-koko caption english korean dataset

## How to use

```python
import requests
import torch
from PIL import Image
from transformers import AutoModel, AutoProcessor

repo = "hyunlord/siglip2-base-patch16-224-ko"
model = AutoModel.from_pretrained(repo)
processor = AutoProcessor.from_pretrained(repo)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

texts = ["고양이 ν•œ 마리", 
         "고양이 두 마리",
         "뢄홍색 μ†ŒνŒŒμ— λ“œλŸ¬λˆ„μš΄ 고양이 μΉœκ΅¬λ“€",
         "리λͺ¨μ»¨κ³Ό 고양이 λ‘λ§ˆλ¦¬",
         "리λͺ¨μ»¨ 두 κ°œμ™€ 고양이 λ‘λ§ˆλ¦¬",
         "뢄홍색 μ†ŒνŒŒ μœ„μ— 리λͺ¨μ»¨ 두 κ°œμ™€ λ“œλŸ¬λˆ„μš΄ 고양이 λ‘λ§ˆλ¦¬"]
inputs = processor(text=texts,
                   images=image,
                   padding="max_length",
                   max_length=64,
                   return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
logits_per_image = outputs.logits_per_image
probs = torch.sigmoid(logits_per_image)
```

```python
>>> probs
tensor([[0.0038, 0.0429, 0.8294, 0.9787, 0.9816, 0.9990]])
```

## MS-COCO Caption Evaluation
| Model | Parameter Size | (En) I-T Recall@1 | (En) T-I Recall@1 | (Ko) I-T Recall@1 | (Ko) T-I Recall@1 |
|---|---|---|---|---|---|
| google/siglip2-base-patch16-224 | 375,187,970 | 65.20% | 48.29% | 45.68% | 25.44% |
| google/siglip2-so400m-patch14-384 | 1,136,008,498 | 67.74% | 52.04% | 52.36% | 31.59% |
| hyunlord/siglip2-base-patch16-224-ko | 375,187,970 | 65.54% | 47.99% | 57.24% | 36.55% |