arxiv:2506.17450

BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

Published on Jun 20

· Submitted by

cccjc on Jun 30

#1 Paper of the day

Upvote

Authors:

Jiacheng Chen ,

Abstract

A generative visual compositing framework using a diffusion model for scene editing and composition with source masking and simulated object jittering.

AI-generated summary

We present BlenderFusion, a generative visual compositing framework that synthesizes new scenes by recomposing objects, camera, and background. It follows a layering-editing-compositing pipeline: (i) segmenting and converting visual inputs into editable 3D entities (layering), (ii) editing them in Blender with 3D-grounded control (editing), and (iii) fusing them into a coherent scene using a generative compositor (compositing). Our generative compositor extends a pre-trained diffusion model to process both the original (source) and edited (target) scenes in parallel. It is fine-tuned on video frames with two key training strategies: (i) source masking, enabling flexible modifications like background replacement; (ii) simulated object jittering, facilitating disentangled control over objects and camera. BlenderFusion significantly outperforms prior methods in complex compositional scene editing tasks.

View arXiv page View PDF Project page Add to collection

Community

cccjc

Paper author Paper submitter about 10 hours ago

While state-of-the-art generation models can generate impressive visuals or perform simple edits from text prompts, they often struggle to precisely edit key elements of input visuals with accurate understanding of 3D and geometry.

We introduce BlenderFusion, a 3D-grounded visual compositing framework, providing precise control and composition of various visual elements, including objects, camera, and background.

The core is to combine the best of both worlds: 3D-grounded editing and generative compositing. Instead of relying solely on text prompts, we leverage a graphics engine (Blender) for precise geometric control and flexible manipulation. We then employ a diffusion model as a generative compositor to synthesize a photorealistic final result.

Project page: https://blenderfusion.github.io/