Rotated Positional Embedding for Object Detection in Latent Space

The initial positional embeddings are rotated to align with the latent coordinates of the tagged objects. Positioning them in proximity to the corresponding object in the image.

Built on a multimodal model, Wan2.1 encoded the image.

Categories:

- [1] hat
- [2] hair
- [3] sunglasses
- [4] shirt
- [5] skirt
- [6] pants
- [7] dress
- [8] belt
- [9] shoes
- [11] face
- [12] legs
- [14] arms
- [16] bag
- [17] scarf

Disclaimer

The documentation and the model requires citation and attribution to the author via a link to their Hugging Face profile.

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
41.4M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support