GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding Paper • 2503.10596 • Published Mar 13 • 18
Knowledge Mining with Scene Text for Fine-Grained Recognition Paper • 2203.14215 • Published Mar 27, 2022
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding Paper • 2412.13193 • Published Dec 17, 2024 • 1
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Paper • 2502.13145 • Published Feb 18 • 38
ControlAR: Controllable Image Generation with Autoregressive Models Paper • 2410.02705 • Published Oct 3, 2024 • 11
Deep High-Resolution Representation Learning for Visual Recognition Paper • 1908.07919 • Published Aug 20, 2019 • 2
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Paper • 2406.20076 • Published Jun 28, 2024 • 10
YOLO-World: Real-Time Open-Vocabulary Object Detection Paper • 2401.17270 • Published Jan 30, 2024 • 41