Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection Paper • 2303.05499 • Published Mar 9, 2023
A Simple Framework for Open-Vocabulary Segmentation and Detection Paper • 2303.08131 • Published Mar 14, 2023
TOSS:High-quality Text-guided Novel View Synthesis from a Single Image Paper • 2310.10644 • Published Oct 16, 2023 • 1
InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image Paper • 2311.02826 • Published Nov 6, 2023 • 1
Recognize Anything: A Strong Image Tagging Model Paper • 2306.03514 • Published Jun 6, 2023 • 11
DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting Paper • 2307.12972 • Published Jul 24, 2023
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks Paper • 2401.14159 • Published Jan 25, 2024 • 2
DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR Paper • 2201.12329 • Published Jan 28, 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection Paper • 2203.03605 • Published Mar 7, 2022
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation Paper • 2206.02777 • Published Jun 6, 2022
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy Paper • 2403.14610 • Published Mar 21, 2024 • 3
Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models Paper • 2405.04233 • Published May 7, 2024 • 2
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents Paper • 2407.01511 • Published Jul 1, 2024
TAPTRv2: Attention-based Position Update Improves Tracking Any Point Paper • 2407.16291 • Published Jul 23, 2024 • 11
TAPTR: Tracking Any Point with Transformers as Detection Paper • 2403.13042 • Published Mar 19, 2024
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding Paper • 2411.14347 • Published Nov 21, 2024 • 14