RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published 5 days ago • 39
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs Paper • 2506.01674 • Published 8 days ago • 26
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Paper • 2505.18842 • Published 16 days ago • 36
Time Blindness: Why Video-Language Models Can't See What Humans Can? Paper • 2505.24867 • Published 10 days ago • 75
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published 16 days ago • 145
SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation Paper • 2505.08665 • Published 27 days ago • 4
Taming the Titans: A Survey of Efficient LLM Inference Serving Paper • 2504.19720 • Published Apr 28 • 10
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning Paper • 2504.18904 • Published Apr 26 • 9
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29 • 31
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published Apr 30 • 46
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21 • 74
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference Paper • 2504.05897 • Published Apr 8 • 18
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published Apr 8 • 168
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization Paper • 2503.19901 • Published Mar 25 • 41