Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport Paper • 2308.01779 • Published Aug 3, 2023 • 1
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning Paper • 2503.00513 • Published Mar 1 • 1
PixelThink: Towards Efficient Chain-of-Pixel Reasoning Paper • 2505.23727 • Published about 1 month ago • 4
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance Paper • 2406.09326 • Published Jun 13, 2024 • 1
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Paper • 2505.18675 • Published May 24 • 23
TokenPacker: Efficient Visual Projector for Multimodal LLM Paper • 2407.02392 • Published Jul 2, 2024 • 24