GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding Paper β’ 2503.10596 β’ Published Mar 13 β’ 18
Knowledge Mining with Scene Text for Fine-Grained Recognition Paper β’ 2203.14215 β’ Published Mar 27, 2022
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding Paper β’ 2412.13193 β’ Published Dec 17, 2024 β’ 1
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Paper β’ 2502.13145 β’ Published Feb 18 β’ 38
ControlAR: Controllable Image Generation with Autoregressive Models Paper β’ 2410.02705 β’ Published Oct 3, 2024 β’ 11
Deep High-Resolution Representation Learning for Visual Recognition Paper β’ 1908.07919 β’ Published Aug 20, 2019 β’ 2
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Paper β’ 2406.20076 β’ Published Jun 28, 2024 β’ 10
YOLO-World: Real-Time Open-Vocabulary Object Detection Paper β’ 2401.17270 β’ Published Jan 30, 2024 β’ 37