Papers
arxiv:2506.05573

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Published on Jun 5
· Submitted by chenguolin on Jun 9
Authors:
,
,
,
,
,

Abstract

PartCrafter is a unified 3D generative model that synthesizes multiple semantically meaningful 3D meshes from a single image using a compositional latent space and hierarchical attention mechanism.

AI-generated summary

We introduce PartCrafter, the first structured 3D generative model that jointly synthesizes multiple semantically meaningful and geometrically distinct 3D meshes from a single RGB image. Unlike existing methods that either produce monolithic 3D shapes or follow two-stage pipelines, i.e., first segmenting an image and then reconstructing each segment, PartCrafter adopts a unified, compositional generation architecture that does not rely on pre-segmented inputs. Conditioned on a single image, it simultaneously denoises multiple 3D parts, enabling end-to-end part-aware generation of both individual objects and complex multi-object scenes. PartCrafter builds upon a pretrained 3D mesh diffusion transformer (DiT) trained on whole objects, inheriting the pretrained weights, encoder, and decoder, and introduces two key innovations: (1) A compositional latent space, where each 3D part is represented by a set of disentangled latent tokens; (2) A hierarchical attention mechanism that enables structured information flow both within individual parts and across all parts, ensuring global coherence while preserving part-level detail during generation. To support part-level supervision, we curate a new dataset by mining part-level annotations from large-scale 3D object datasets. Experiments show that PartCrafter outperforms existing approaches in generating decomposable 3D meshes, including parts that are not directly visible in input images, demonstrating the strength of part-aware generative priors for 3D understanding and synthesis. Code and training data will be released.

Community

Paper author Paper submitter

PartCrafter: A 3D-native DiT that generates 3D objects in parts 🧩
✅ No extra segmentation
✅ Pure 3D-native DiT
🎯 Generates part-aware 3D objects out of the box

Project page: https://wgsxm.github.io/projects/partcrafter
Code: https://github.com/wgsxm/PartCrafter
Paper: https://arxiv.org/pdf/2506.05573

PartCrafter is a powerful structured 3D generative model designed to simultaneously generate multiple parts and objects from a single RGB image in a ⚡️feed-forward manner⚡️.
https://wgsxm.github.io/projects/partcrafter/

The code, pre-trained checkpoints, and a Hugging Face 🤗 demo will be available soon!
Stay tuned for exciting updates! 🚀 @akhaliq

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.05573 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.05573 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.05573 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.