ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos Paper • 2503.12542 • Published Mar 16 • 1
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks Paper • 2507.11336 • Published Jul 15 • 4