Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
maxiw 
posted an update Sep 4
Post
2282
The new Qwen-2 VL models seem to perform quite well in object detection. You can prompt them to respond with bounding boxes in a reference frame of 1k x 1k pixels and scale those boxes to the original image size.

You can try it out with my space maxiw/Qwen2-VL-Detection

According to @simonw Gemini might also be able to do this but OpenAI’s GPT-4o and Anthropic’s Claude 3 and Claude 3.5 models can’t.
https://simonwillison.net/2024/Aug/26/gemini-bounding-box-visualization/

May I ask what dataset was used for fine tuning in this task? Was lora used and can the parameters of lora be shared out? Looking forward to your reply!

·

@fridayfairy this is not fine-tuned. It's the base model just prompted to return bounding boxes in a specific format. The Qwen2-VL models must have been pre-trained on detection data.

Truly likes the way Qwen2-VL does thing. I finetune from its and wow :)