File size: 1,491 Bytes
0271854
 
 
 
 
 
a21ff88
0271854
 
a21ff88
0271854
 
 
 
 
 
 
 
a21ff88
0271854
 
 
 
a21ff88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
language: []
license: mit
tags:
  - pytorch
  - image-segmentation
  - sam2
  - glove
  - baseball
  - sports-analytics
  - computer-vision
  - custom-model
library_name: pytorch
datasets:
  - custom
metrics:
  - dice
  - iou
inference: true
widget: []
model-index:
  - name: glove_labelling
    results: []
---

# Glove Labelling Model (SAM2 fine-tuned)

This repository contains a fine-tuned [SAM2](https://github.com/facebookresearch/sam2) hierarchical image segmentation model adapted for high-precision baseball glove segmentation.

### 💡 What it does

Given a frame from a pitching video, this model outputs per-pixel segmentations for:

- `glove_outline`
- `webbing`
- `thumb`
- `palm_pocket`
- `hand`
- `glove_exterior`

Trained on individual pitch frame sequences using COCO format masks.

---

### 🏗 Architecture

- Base Model: `SAM2Hierarchical`
- Framework: PyTorch
- Input shape: `[1, 3, 720, 1280]` RGB frame
- Output: Segmentation logits across 6 glove-related classes

---

### 🔧 Usage

To use the model for inference:

```python
import torch
from PIL import Image
import torchvision.transforms as T

model = torch.load("pytorch_model.bin", map_location="cpu")
model.eval()

transform = T.Compose([
    T.Resize((720, 1280)),
    T.ToTensor()
])

img = Image.open("example.jpg").convert("RGB")
x = transform(img).unsqueeze(0)

with torch.no_grad():
    output = model(x)

# Convert logits to class labels
pred_mask = output.argmax(dim=1).squeeze().cpu().numpy()