File size: 4,257 Bytes

---
license: apache-2.0
language:
- fr
- en
pipeline_tag: zero-shot-object-detection
library_name: transformers
base_model:
- omlab/omdet-turbo-swin-tiny-hf
tags:
- endpoints-template
---

# Fork of [omlab/omdet-turbo-swin-tiny-hf](https://huggingface.co/omlab/omdet-turbo-swin-tiny-hf) for a `zero-shot-object-detection` Inference endpoint.

This repository implements a `custom` task for `zero-shot-object-detection` for 🤗 Inference Endpoints. The code for the customized handler is in the [handler.py](https://huggingface.co/Blueway/inference-endpoint-for-omdet-turbo-swin-tiny-hf/blob/main/handler.py).

To use deploy this model a an Inference Endpoint you have to select `Custom` as task to use the `handler.py` file. 

The repository contains a requirements.txt to download the timm library.

### expected Request payload

```json
{
  inputs:{
    "image": "/9j/4AAQSkZJRgABAQEBLAEsAAD/2wBDAAMCAgICAgMC....", // base64 image as bytes
    "candiates":["broken curb", "broken road", "broken road sign", "broken sidewalk"]
  }
}
```

below is an example on how to run a request using Python and `requests`.

## Run Request 

``` python
import json
from typing import List
import requests as r
import base64

ENDPOINT_URL = ""
HF_TOKEN = ""

def predict(path_to_image: str = None, candidates: List[str] = None):
    with open(path_to_image, "rb") as i:
        b64 = base64.b64encode(i.read())

    payload = {"inputs": {"image": b64.decode("utf-8"), "candidates": candidates}}
    response = r.post(
        ENDPOINT_URL, headers={"Authorization": f"Bearer {HF_TOKEN}"}, json=payload
    )
    return response.json()


prediction = predict(
    path_to_image="image/brokencurb.jpg", candidates=["broken curb", "broken road", "broken road sign", "broken sidewalk"]
)
print(json.dumps(prediction, indent=2))
```
expected output

``` python
{
  "boxes": [
    [
      1.919342041015625,
      231.1556396484375,
      1011.4019775390625,
      680.3773193359375
    ],
    [
      610.9949951171875,
      397.6180419921875,
      1019.9259033203125,
      510.8144226074219
    ],
    [
      1.919342041015625,
      231.1556396484375,
      1011.4019775390625,
      680.3773193359375
    ],
    [
      786.1240234375,
      68.618896484375,
      916.1265869140625,
      225.0513458251953
    ]
  ],
  "scores": [
    0.4329715967178345,
    0.4215811491012573,
    0.3389397859573364,
    0.3133399784564972
  ],
  "candidates": [
    "broken sidewalk",
    "broken road sign",
    "broken road",
    "broken road sign"
  ]
}
```
The boxes are structured like {x_min, y_min, x_max, y_max}

## visualize result

<figure>
  <img src="https://cdn-uploads.huggingface.co/production/uploads/661e3161112a872ebdee8bbc/KFT09GSYWn2gEllATejSZ.png" alt="image/png">
  <figcaption>input image</figcaption>
</figure>

To visualize the result of the request you can implement this code

``` python
prediction = predict(
    path_to_image="image/cat_and_remote.jpg", candidates=["cat", "remote", "pot hole"]
)

import matplotlib.pyplot as plt
import matplotlib.patches as patches

with open("image/cat_and_remote.jpg", "rb") as i:
    image = plt.imread(i)
    
# Plot image
fig, ax = plt.subplots(1)
ax.imshow(image)
for score, class_name, box in zip(
    prediction["scores"], prediction["candidates"], prediction["boxes"]
):
    # Create a Rectangle patch
    rect = patches.Rectangle([int(box[0]), int(box[1])], int(box[2] - box[0]), int(box[3] - box[1]), linewidth=1, edgecolor='r', facecolor='none')
    # Add the patch to the Axes
    ax.add_patch(rect)
    
    ax.text(int(box[0]), int(box[1]), str(round(score, 2)) + " " + str(class_name), color='white', fontsize=6, bbox=dict(facecolor='red', alpha=0.5))
    
plt.savefig('image_result/cat_and_remote_with_bboxes_zero_shot.jpeg')
```

**result**

<figure>
  <img src="https://cdn-uploads.huggingface.co/production/uploads/661e3161112a872ebdee8bbc/8xPoidjVyRQBs990hR4sq.png" alt="image/png">
  <figcaption>output image</figcaption>
</figure>


## Credits

This adaptation for huggingface inference endpoint was inspiered by [@philschmid](https://huggingface.co/philschmid) work on [philschmid/clip-zero-shot-image-classification](https://huggingface.co/philschmid/clip-zero-shot-image-classification).