Sonic: Shifting Focus to Global Audio Perception in Portrait Animation Paper • 2411.16331 • Published Nov 25, 2024 • 8
HunyuanVideo: A Systematic Framework For Large Video Generative Models Paper • 2412.03603 • Published Dec 3, 2024 • 9
UTRNet: High-Resolution Urdu Text Recognition In Printed Documents Paper • 2306.15782 • Published Jun 27, 2023 • 7
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM Paper • 2405.15341 • Published May 24, 2024 • 7
Echo-DND: A dual noise diffusion model for robust and precise left ventricle segmentation in echocardiography Paper • 2506.15166 • Published Jun 18 • 7
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM Paper • 2405.15341 • Published May 24, 2024 • 7
Echo-DND: A dual noise diffusion model for robust and precise left ventricle segmentation in echocardiography Paper • 2506.15166 • Published Jun 18 • 7
UTRNet: High-Resolution Urdu Text Recognition In Printed Documents Paper • 2306.15782 • Published Jun 27, 2023 • 7