-
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation
Paper • 2507.02608 • Published • 21 -
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Paper • 2503.10631 • Published -
Mobile Video Diffusion
Paper • 2412.07583 • Published • 20
Stoney Kang
sikang99
AI & ML interests
Remote Control based on Vision
Recent Activity
upvoted
a
paper
about 6 hours ago
ViExam: Are Vision Language Models Better than Humans on Vietnamese
Multimodal Exam Questions?
upvoted
a
paper
about 9 hours ago
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid
Mamba-Transformer Reasoning Model
upvoted
a
paper
about 9 hours ago
RynnEC: Bringing MLLMs into Embodied World
Organizations
Diffusion Model
Vision Processing
VLA Models
Vision Language Models for Robotics
-
Unified Vision-Language-Action Model
Paper • 2506.19850 • Published • 27 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 128 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 10 -
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Paper • 2312.14457 • Published • 1
3D Generation
-
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details
Paper • 2506.16504 • Published • 23 -
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
Paper • 2506.15442 • Published • 12 -
Dens3R: A Foundation Model for 3D Geometry Prediction
Paper • 2507.16290 • Published • 7 -
GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors
Paper • 2508.09667 • Published • 5
Diffusion Models
-
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation
Paper • 2507.02608 • Published • 21 -
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Paper • 2503.10631 • Published -
Mobile Video Diffusion
Paper • 2412.07583 • Published • 20
VLM, MLLM
Diffusion Model
Reinforcement Learning
Vision Processing
Simulation
VLA Models
Vision Language Models for Robotics
-
Unified Vision-Language-Action Model
Paper • 2506.19850 • Published • 27 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 128 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 10 -
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Paper • 2312.14457 • Published • 1
AI Agents
3D Generation
-
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details
Paper • 2506.16504 • Published • 23 -
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
Paper • 2506.15442 • Published • 12 -
Dens3R: A Foundation Model for 3D Geometry Prediction
Paper • 2507.16290 • Published • 7 -
GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors
Paper • 2508.09667 • Published • 5
Video Generation