Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published Oct 23, 2025 • 45
AppAgent: Multimodal Agents as Smartphone Users Paper • 2312.13771 • Published Dec 21, 2023 • 54
AppAgent: Multimodal Agents as Smartphone Users Paper • 2312.13771 • Published Dec 21, 2023 • 54
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models Paper • 2312.13913 • Published Dec 21, 2023 • 24
Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation Paper • 2306.17115 • Published Jun 29, 2023 • 11
A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction Paper • 2301.06782 • Published Jan 17, 2023 • 1
VQ-NeRF: Vector Quantization Enhances Implicit Neural Representations Paper • 2310.14487 • Published Oct 23, 2023 • 1
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts Paper • 2312.10763 • Published Dec 17, 2023 • 19
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts Paper • 2312.10763 • Published Dec 17, 2023 • 19
stabilityai/stable-video-diffusion-img2vid-xt Image-to-Video • Updated Jul 10, 2024 • 174k • 3.21k