@merve on Hugging Face: "A real-time object detector much faster and accurate than YOLO with Apache 2.0…"

Post

455

A real-time object detector much faster and accurate than YOLO with Apache 2.0 license just landed to Hugging Face transformers 🔥

D-FINE is the sota real-time object detector that runs on T4 (free Colab) 🤩

> Collection with all checkpoints and demo ustc-community/d-fine-68109b427cbe6ee36b4e7352

Notebooks:
> Tracking https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_tracking.ipynb
> Inference https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_inference.ipynb
> Fine-tuning https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_finetune_on_a_custom_dataset.ipynb
h/t @vladislavbro @qubvel-hf @ariG23498 and the authors of the paper 🎩

Regular object detectors attempt to predict bounding boxes in (x, y, w, h) pixel perfect coordinates, which is very rigid and hard to solve 🥲☹️

D-FINE formulates object detection as a distribution for bounding box coordinates, refines them iteratively, and it's more accurate 🤩

Another core idea behind this model is Global Optimal Localization Self-Distillation ⤵️

this model uses final layer's distribution output (sort of like a teacher) to distill to earlier layers to make early layers more performant.