|
--- |
|
license: apache-2.0 |
|
language: |
|
- fr |
|
- en |
|
pipeline_tag: zero-shot-object-detection |
|
library_name: transformers |
|
base_model: |
|
- omlab/omdet-turbo-swin-tiny-hf |
|
tags: |
|
- endpoints-template |
|
--- |
|
|
|
# Fork of [omlab/omdet-turbo-swin-tiny-hf](https://huggingface.co/omlab/omdet-turbo-swin-tiny-hf) for a `zero-shot-object-detection` Inference endpoint. |
|
|
|
This repository implements a `custom` task for `zero-shot-object-detection` for 🤗 Inference Endpoints. The code for the customized handler is in the [handler.py](https://huggingface.co/Blueway/inference-endpoint-for-omdet-turbo-swin-tiny-hf/blob/main/handler.py). |
|
|
|
To use deploy this model a an Inference Endpoint you have to select `Custom` as task to use the `handler.py` file. |
|
|
|
The repository contains a requirements.txt to download the timm library. |
|
|
|
### expected Request payload |
|
|
|
```json |
|
{ |
|
inputs:{ |
|
"image": "/9j/4AAQSkZJRgABAQEBLAEsAAD/2wBDAAMCAgICAgMC....", // base64 image as bytes |
|
"candiates":["broken curb", "broken road", "broken road sign", "broken sidewalk"] |
|
} |
|
} |
|
``` |
|
|
|
below is an example on how to run a request using Python and `requests`. |
|
|
|
## Run Request |
|
|
|
``` python |
|
import json |
|
from typing import List |
|
import requests as r |
|
import base64 |
|
|
|
ENDPOINT_URL = "" |
|
HF_TOKEN = "" |
|
|
|
def predict(path_to_image: str = None, candidates: List[str] = None): |
|
with open(path_to_image, "rb") as i: |
|
b64 = base64.b64encode(i.read()) |
|
|
|
payload = {"inputs": {"image": b64.decode("utf-8"), "candidates": candidates}} |
|
response = r.post( |
|
ENDPOINT_URL, headers={"Authorization": f"Bearer {HF_TOKEN}"}, json=payload |
|
) |
|
return response.json() |
|
|
|
|
|
prediction = predict( |
|
path_to_image="image/brokencurb.jpg", candidates=["broken curb", "broken road", "broken road sign", "broken sidewalk"] |
|
) |
|
print(json.dumps(prediction, indent=2)) |
|
``` |
|
expected output |
|
|
|
``` python |
|
{ |
|
"boxes": [ |
|
[ |
|
1.919342041015625, |
|
231.1556396484375, |
|
1011.4019775390625, |
|
680.3773193359375 |
|
], |
|
[ |
|
610.9949951171875, |
|
397.6180419921875, |
|
1019.9259033203125, |
|
510.8144226074219 |
|
], |
|
[ |
|
1.919342041015625, |
|
231.1556396484375, |
|
1011.4019775390625, |
|
680.3773193359375 |
|
], |
|
[ |
|
786.1240234375, |
|
68.618896484375, |
|
916.1265869140625, |
|
225.0513458251953 |
|
] |
|
], |
|
"scores": [ |
|
0.4329715967178345, |
|
0.4215811491012573, |
|
0.3389397859573364, |
|
0.3133399784564972 |
|
], |
|
"candidates": [ |
|
"broken sidewalk", |
|
"broken road sign", |
|
"broken road", |
|
"broken road sign" |
|
] |
|
} |
|
``` |
|
The boxes are structured like {x_min, y_min, x_max, y_max} |
|
|
|
## visualize result |
|
|
|
<figure> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/661e3161112a872ebdee8bbc/KFT09GSYWn2gEllATejSZ.png" alt="image/png"> |
|
<figcaption>input image</figcaption> |
|
</figure> |
|
|
|
To visualize the result of the request you can implement this code |
|
|
|
``` python |
|
prediction = predict( |
|
path_to_image="image/cat_and_remote.jpg", candidates=["cat", "remote", "pot hole"] |
|
) |
|
|
|
import matplotlib.pyplot as plt |
|
import matplotlib.patches as patches |
|
|
|
with open("image/cat_and_remote.jpg", "rb") as i: |
|
image = plt.imread(i) |
|
|
|
# Plot image |
|
fig, ax = plt.subplots(1) |
|
ax.imshow(image) |
|
for score, class_name, box in zip( |
|
prediction["scores"], prediction["candidates"], prediction["boxes"] |
|
): |
|
# Create a Rectangle patch |
|
rect = patches.Rectangle([int(box[0]), int(box[1])], int(box[2] - box[0]), int(box[3] - box[1]), linewidth=1, edgecolor='r', facecolor='none') |
|
# Add the patch to the Axes |
|
ax.add_patch(rect) |
|
|
|
ax.text(int(box[0]), int(box[1]), str(round(score, 2)) + " " + str(class_name), color='white', fontsize=6, bbox=dict(facecolor='red', alpha=0.5)) |
|
|
|
plt.savefig('image_result/cat_and_remote_with_bboxes_zero_shot.jpeg') |
|
``` |
|
|
|
**result** |
|
|
|
<figure> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/661e3161112a872ebdee8bbc/8xPoidjVyRQBs990hR4sq.png" alt="image/png"> |
|
<figcaption>output image</figcaption> |
|
</figure> |
|
|
|
|
|
## Credits |
|
|
|
This adaptation for huggingface inference endpoint was inspiered by [@philschmid](https://huggingface.co/philschmid) work on [philschmid/clip-zero-shot-image-classification](https://huggingface.co/philschmid/clip-zero-shot-image-classification). |
|
|