-
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation
Paper • 2507.02608 • Published • 21 -
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Paper • 2503.10631 • Published -
Mobile Video Diffusion
Paper • 2412.07583 • Published • 20
Stoney Kang
sikang99
AI & ML interests
Remote Control based on Vision
Recent Activity
upvoted
a
paper
about 1 hour ago
ViExam: Are Vision Language Models Better than Humans on Vietnamese
Multimodal Exam Questions?
upvoted
a
paper
about 4 hours ago
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid
Mamba-Transformer Reasoning Model
upvoted
a
paper
about 4 hours ago
RynnEC: Bringing MLLMs into Embodied World