Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport Paper • 2308.01779 • Published Aug 3, 2023 • 1
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning Paper • 2503.00513 • Published Mar 1 • 1
Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction Paper • 2503.23109 • Published Mar 29
PixelThink: Towards Efficient Chain-of-Pixel Reasoning Paper • 2505.23727 • Published about 1 month ago • 4
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Paper • 2505.18675 • Published May 24 • 23
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance Paper • 2406.09326 • Published Jun 13, 2024 • 1
TokenPacker: Efficient Visual Projector for Multimodal LLM Paper • 2407.02392 • Published Jul 2, 2024 • 24