Bounding boxes coordinates
What rescaling should be done so that the bbox coordinates are matching the original image? I am seeing some mismatches but can't seem to figure what's the issue.
The bbox_2d coordinates are x1, y1, x2, y2 rather than x,y,w,h. And they will be relative to your resized image size if you are resizing. For example:
image = Image.open(image_path)
img_width, img_height = image.size
max_size = 1280
if max(image.size) > max_size:
ratio = max_size / max(image.size)
new_size = tuple(int(dim * ratio) for dim in image.size)
# set each dimension to be a multiple of 28
new_size = tuple(int(dim // 28) * 28 for dim in new_size)
image = image.resize(new_size, Image.LANCZOS)
img_width, img_height = image.size
then in the messages:
{
"role": "user",
"content": [
{
"type": "image",
"image": f"file://{image_path}",
"resized_width": img_width,
"resized_height": img_height,
},
.....
i cannot manage to get the coordinates right ... Please help!
i have a image with traffic-signs and like to detect the stop/bus-sign.
the original image has 1920*1080 pixels. With Max_Pixel 1280 i scale down
to a image-size of 1260x700 (28 Pixel Blocks, smaller 1280, X:45x28, Y:25x28)
Prompt for the 7b Model is: "Locate the Stop-sign and return the location in the form of coordinates in the format {'bbox_2d': [x1, y1, x2, y2]}."
i run the detection on the scaled image as Base64.
Result in X looks always good but y is offset (but wondering why stop is too less and bus is too much ...).
i use OLLAMA for this model - so the complete Ollama call is:
------------------------------JSON-----------------------------------
[
{
"role": "system",
"content": "You are a knowledgeable, efficient, and direct AI assistant. \r\nProvide concise answers, focusing on the key information needed. \r\nOffer suggestions tactfully when appropriate to improve outcomes. \r\nEngage in productive collaboration with the user."
},
{
"role": "user",
"content": "Locate the Bus-sign and return the location in the form of coordinates in the format {'bbox_2d': [x1, y1, x2, y2]}.",
"Images": [
"iVBORw0KGgoAAAANSUhEUgAABOw ... ly4cOHChQsXLlwMG7gk1oULFy5cuHDhwoULFy5cDBM4zv8HKdhcabzmo1oAAAAASUVORK5CYII="
]
}
]
Annotation is:
For Each item In items
Dim X1 As Integer = CInt(item("bbox_2d")(0))
Dim Y1 As Integer = CInt(item("bbox_2d")(1))
Dim X2 As Integer = CInt(item("bbox_2d")(2))
Dim Y2 As Integer = CInt(item("bbox_2d")(3))
BMP2.Draw(New Rectangle(X1, Y1, X2 - X1, Y2 - Y1), New Bgra(0, 0, 255, 255), 2)
Next
Could be also a problem with Ollama because there is no option (at least i don't found any) to set
"resized_width": img_width,
"resized_height": img_height,
maybe you have any sugestions how the y could be set to the correct position.
i cannot manage to get the coordinates right ... Please help!
i have a image with traffic-signs and like to detect the stop/bus-sign.
the original image has 1920*1080 pixels. With Max_Pixel 1280 i scale down
to a image-size of 1260x700 (28 Pixel Blocks, smaller 1280, X:45x28, Y:25x28)Prompt for the 7b Model is: "Locate the Stop-sign and return the location in the form of coordinates in the format {'bbox_2d': [x1, y1, x2, y2]}."
i run the detection on the scaled image as Base64.Result in X looks always good but y is offset (but wondering why stop is too less and bus is too much ...).
i use OLLAMA for this model - so the complete Ollama call is:
------------------------------JSON-----------------------------------
[
{
"role": "system",
"content": "You are a knowledgeable, efficient, and direct AI assistant. \r\nProvide concise answers, focusing on the key information needed. \r\nOffer suggestions tactfully when appropriate to improve outcomes. \r\nEngage in productive collaboration with the user."
},
{
"role": "user",
"content": "Locate the Bus-sign and return the location in the form of coordinates in the format {'bbox_2d': [x1, y1, x2, y2]}.",
"Images": [
"iVBORw0KGgoAAAANSUhEUgAABOw ... ly4cOHChQsXLlwMG7gk1oULFy5cuHDhwoULFy5cDBM4zv8HKdhcabzmo1oAAAAASUVORK5CYII="
]
}
]Annotation is:
For Each item In items Dim X1 As Integer = CInt(item("bbox_2d")(0)) Dim Y1 As Integer = CInt(item("bbox_2d")(1)) Dim X2 As Integer = CInt(item("bbox_2d")(2)) Dim Y2 As Integer = CInt(item("bbox_2d")(3)) BMP2.Draw(New Rectangle(X1, Y1, X2 - X1, Y2 - Y1), New Bgra(0, 0, 255, 255), 2) Next
Could be also a problem with Ollama because there is no option (at least i don't found any) to set
"resized_width": img_width,
"resized_height": img_height,maybe you have any sugestions how the y could be set to the correct position.
Hi @Phreak87 , I struggled with that as well, I attempted to explain it in this medium post https://medium.com/@levchevajoana/qwen2-5-vl-with-mlx-vlm-c4329b40ab87. If you have any questions I’ll try to help.