File size: 4,257 Bytes
188bfd1
 
 
 
 
 
b49ce6d
5289124
 
 
 
54fbbbf
 
5289124
 
 
 
 
 
 
 
 
 
 
 
ce5111d
 
 
 
5289124
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0423b63
 
 
 
 
 
 
de54f56
0423b63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5289124
 
 
ef25945
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
license: apache-2.0
language:
- fr
- en
pipeline_tag: zero-shot-object-detection
library_name: transformers
base_model:
- omlab/omdet-turbo-swin-tiny-hf
tags:
- endpoints-template
---

# Fork of [omlab/omdet-turbo-swin-tiny-hf](https://huggingface.co/omlab/omdet-turbo-swin-tiny-hf) for a `zero-shot-object-detection` Inference endpoint.

This repository implements a `custom` task for `zero-shot-object-detection` for 🤗 Inference Endpoints. The code for the customized handler is in the [handler.py](https://huggingface.co/Blueway/inference-endpoint-for-omdet-turbo-swin-tiny-hf/blob/main/handler.py).

To use deploy this model a an Inference Endpoint you have to select `Custom` as task to use the `handler.py` file. 

The repository contains a requirements.txt to download the timm library.

### expected Request payload

```json
{
  inputs:{
    "image": "/9j/4AAQSkZJRgABAQEBLAEsAAD/2wBDAAMCAgICAgMC....", // base64 image as bytes
    "candiates":["broken curb", "broken road", "broken road sign", "broken sidewalk"]
  }
}
```

below is an example on how to run a request using Python and `requests`.

## Run Request 

``` python
import json
from typing import List
import requests as r
import base64

ENDPOINT_URL = ""
HF_TOKEN = ""

def predict(path_to_image: str = None, candidates: List[str] = None):
    with open(path_to_image, "rb") as i:
        b64 = base64.b64encode(i.read())

    payload = {"inputs": {"image": b64.decode("utf-8"), "candidates": candidates}}
    response = r.post(
        ENDPOINT_URL, headers={"Authorization": f"Bearer {HF_TOKEN}"}, json=payload
    )
    return response.json()


prediction = predict(
    path_to_image="image/brokencurb.jpg", candidates=["broken curb", "broken road", "broken road sign", "broken sidewalk"]
)
print(json.dumps(prediction, indent=2))
```
expected output

``` python
{
  "boxes": [
    [
      1.919342041015625,
      231.1556396484375,
      1011.4019775390625,
      680.3773193359375
    ],
    [
      610.9949951171875,
      397.6180419921875,
      1019.9259033203125,
      510.8144226074219
    ],
    [
      1.919342041015625,
      231.1556396484375,
      1011.4019775390625,
      680.3773193359375
    ],
    [
      786.1240234375,
      68.618896484375,
      916.1265869140625,
      225.0513458251953
    ]
  ],
  "scores": [
    0.4329715967178345,
    0.4215811491012573,
    0.3389397859573364,
    0.3133399784564972
  ],
  "candidates": [
    "broken sidewalk",
    "broken road sign",
    "broken road",
    "broken road sign"
  ]
}
```
The boxes are structured like {x_min, y_min, x_max, y_max}

## visualize result

<figure>
  <img src="https://cdn-uploads.huggingface.co/production/uploads/661e3161112a872ebdee8bbc/KFT09GSYWn2gEllATejSZ.png" alt="image/png">
  <figcaption>input image</figcaption>
</figure>

To visualize the result of the request you can implement this code

``` python
prediction = predict(
    path_to_image="image/cat_and_remote.jpg", candidates=["cat", "remote", "pot hole"]
)

import matplotlib.pyplot as plt
import matplotlib.patches as patches

with open("image/cat_and_remote.jpg", "rb") as i:
    image = plt.imread(i)
    
# Plot image
fig, ax = plt.subplots(1)
ax.imshow(image)
for score, class_name, box in zip(
    prediction["scores"], prediction["candidates"], prediction["boxes"]
):
    # Create a Rectangle patch
    rect = patches.Rectangle([int(box[0]), int(box[1])], int(box[2] - box[0]), int(box[3] - box[1]), linewidth=1, edgecolor='r', facecolor='none')
    # Add the patch to the Axes
    ax.add_patch(rect)
    
    ax.text(int(box[0]), int(box[1]), str(round(score, 2)) + " " + str(class_name), color='white', fontsize=6, bbox=dict(facecolor='red', alpha=0.5))
    
plt.savefig('image_result/cat_and_remote_with_bboxes_zero_shot.jpeg')
```

**result**

<figure>
  <img src="https://cdn-uploads.huggingface.co/production/uploads/661e3161112a872ebdee8bbc/8xPoidjVyRQBs990hR4sq.png" alt="image/png">
  <figcaption>output image</figcaption>
</figure>


## Credits

This adaptation for huggingface inference endpoint was inspiered by [@philschmid](https://huggingface.co/philschmid) work on [philschmid/clip-zero-shot-image-classification](https://huggingface.co/philschmid/clip-zero-shot-image-classification).