Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Paper • 2504.10465 • Published Apr 14 • 28
COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation Paper • 2502.02589 • Published Feb 4 • 10
COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation Paper • 2502.02589 • Published Feb 4 • 10 • 2
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11, 2024 • 60
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11, 2024 • 60
COCONut Dataset Collection This is a collection of COCONut datasets accepted at CVPR2024 • 3 items • Updated Apr 29, 2024 • 6