UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published 1 day ago • 42
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models Paper • 2412.19645 • Published Dec 27, 2024 • 13
On the Trajectory Regularity of ODE-based Diffusion Sampling Paper • 2405.11326 • Published May 18, 2024 • 1
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper • 2412.08580 • Published Dec 11, 2024 • 45
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion Paper • 2412.03515 • Published Dec 4, 2024 • 27
B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests Paper • 2409.08692 • Published Sep 13, 2024 • 27
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters Paper • 2408.17253 • Published Aug 30, 2024 • 39
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29, 2024 • 50
Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts Paper • 2307.07218 • Published Jul 14, 2023 • 27
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias Paper • 2306.03509 • Published Jun 6, 2023 • 5