Rotated Positional Embedding for Object Detection in Latent Space
The initial positional embeddings are rotated to align with the latent coordinates of the tagged objects. Positioning them in proximity to the corresponding object in the image.
Built on a multimodal model, Wan2.1 encoded the image.
Categories:
- [1] hat
- [2] hair
- [3] sunglasses
- [4] shirt
- [5] skirt
- [6] pants
- [7] dress
- [8] belt
- [9] shoes
- [11] face
- [12] legs
- [14] arms
- [16] bag
- [17] scarf
Disclaimer
The documentation and the model requires citation and attribution to the author via a link to their Hugging Face profile.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support