Papers - a pangzs Collection

pangzs 's Collections

Papers

In-context learning

Papers

updated 3 days ago

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 185
On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7 • 83
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Paper • 2505.05467 • Published May 8 • 14
Adapting Vision-Language Models Without Labels: A Comprehensive Survey

Paper • 2508.05547 • Published 17 days ago • 11
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

Paper • 2508.02095 • Published 20 days ago • 6
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published 18 days ago • 103
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published 11 days ago • 5
MedSAMix: A Training-Free Model Merging Approach for Medical Image Segmentation

Paper • 2508.11032 • Published 9 days ago • 2
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Paper • 2508.09736 • Published 11 days ago • 50
Ovis2.5 Technical Report

Paper • 2508.11737 • Published 9 days ago • 99