haoningwu commited on
Commit
c5e4f8b
ยท
verified ยท
1 Parent(s): e12c977

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -16
README.md CHANGED
@@ -1,16 +1,134 @@
1
- ---
2
- library_name: trellis
3
- pipeline_tag: image-to-3d
4
- license: mit
5
- language:
6
- - en
7
- ---
8
- # TRELLIS Image Large
9
-
10
- <!-- Provide a quick summary of what the model is/does. -->
11
-
12
- The image conditioned version of TRELLIS, a large 3D genetive model. It was introduced in the paper [Structured 3D Latents for Scalable and Versatile 3D Generation](https://huggingface.co/papers/2412.01506).
13
-
14
- Project page: https://trellis3d.github.io/
15
-
16
- Code: https://github.com/Microsoft/TRELLIS
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: trellis
3
+ pipeline_tag: image-to-3d
4
+ license: mit
5
+ language:
6
+ - en
7
+ ---
8
+
9
+ # SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
10
+
11
+ This repository contains the official PyTorch implementation of SceneGen: https://arxiv.org/abs/2508.15769/. Feel free to reach out for discussions!
12
+
13
+ **Now the Inference Code and Pretrained Models are released!**
14
+
15
+ <div align="center">
16
+ <img src="./assets/SceneGen.png">
17
+ </div>
18
+
19
+ ## ๐ŸŒŸ Some Information
20
+ [Project Page](https://mengmouxu.github.io/SceneGen/) $\cdot$ [Paper](https://arxiv.org/abs/2508.15769/) $\cdot$ [Checkpoints](https://huggingface.co/haoningwu/SceneGen/)
21
+
22
+ ## โฉ News
23
+ - [2025.8] Our pre-print paper is released on arXiv.
24
+ - [2025.8] The inference code and checkpoints are released.
25
+
26
+ ## ๐Ÿ“ฆ Installation & Pretrained Models
27
+
28
+ ### Prerequisites
29
+ - **Hardware**: An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A100 and RTX 3090 GPUs.
30
+ - **Software**:
31
+ - The [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive) is needed to compile certain submodules. The code has been tested with CUDA versions 12.1.
32
+ - Python version 3.8 or higher is required.
33
+
34
+ ### Installation Steps
35
+ 1. Clone the repo:
36
+ ```sh
37
+ git clone https://github.com/Mengmouxu/SceneGen.git
38
+ cd SceneGen
39
+ ```
40
+
41
+ 2. Install the dependencies:
42
+ Create a new conda environment named `scenegen` and install the dependencies:
43
+ ```sh
44
+ . ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast --demo
45
+ ```
46
+ The detailed usage of `setup.sh` can be found by running `. ./setup.sh --help`.
47
+
48
+ ### Pretrained Models
49
+ 1. First create a directory in the SceneGen folder to store the checkpoints:
50
+ ```sh
51
+ mkdir -p checkpoints
52
+ ```
53
+ 2. Download the pretrained models for **SAM2-Hiera-Large** and **VGGT-1B** from [SAM2](https://huggingface.co/facebook/sam2-hiera-large/) and [VGGT](https://huggingface.co/facebook/VGGT-1B/), then place them in the `checkpoints` directory. (**SAM2** installation and its checkpoints are required for interactive generation with segmentation.)
54
+ 3. Download our pretrained SceneGen model from [here](https://huggingface.co/haoningwu/SceneGen/) and place it in the `checkpoints` directory as follows:
55
+ ```
56
+ SceneGen/
57
+ โ”œโ”€โ”€ checkpoints/
58
+ โ”‚ โ”œโ”€โ”€ sam2-hiera-large
59
+ โ”‚ โ”œโ”€โ”€ VGGT-1B
60
+ โ”‚ โ””โ”€โ”€ scenegen
61
+ | โ”œโ”€โ”€ckpts
62
+ | โ””โ”€โ”€pipeline.json
63
+ โ””โ”€โ”€ ...
64
+ ```
65
+ ## ๐Ÿ’ก Inference
66
+ We provide two scripts for inference: `inference.py` for batch processing and `interactive_demo.py` for an interactive Gradio demo.
67
+
68
+ ### Interactive Demo
69
+ This script launches a Gradio web interface for interactive scene generation.
70
+ - **Features**: It uses SAM2 for interactive image segmentation, allows for adjusting various generation parameters, and supports scene generation from single or multiple images.
71
+ - **Usage**:
72
+ ```sh
73
+ python interactive_demo.py
74
+ ```
75
+ > ## ๐Ÿš€ Quick Start Guide
76
+ >
77
+ > ### ๐Ÿ“ท Step 1: Input & Segment
78
+ > 1. **Upload your scene image.**
79
+ > 2. **Use the mouse to draw bounding boxes** around objects.
80
+ > 3. Click **"Run Segmentation"** to segment objects.
81
+ > > *โ€ป For multi-image generation: maintain consistent object annotation order across all images.*
82
+ >
83
+ > ### ๐Ÿ—ƒ๏ธ Step 2: Manage Cache
84
+ > 1. Click **"Add to Cache"** when satisfied with the segmentation.
85
+ > 2. Repeat Step 1-2 for multiple images.
86
+ > 3. Use **"Delete Selected"** or **"Clear All"** to manage cached images.
87
+ >
88
+ > ### ๐ŸŽฎ Step 3: Generate Scene
89
+ > 1. Adjust generation parameters (optional).
90
+ > 2. Click **"Generate 3D Scene"**.
91
+ > 3. Download the generated GLB file when ready.
92
+ >
93
+ > **๐Ÿ’ก Pro Tip:** Try the examples below to get started quickly!
94
+
95
+ ### Pre-segmented Image Inference
96
+ This script processes a directory of pre-segmented images.
97
+ - **Input**: The input folder structure should be similar to `assets/masked_image_test`, containing segmented scene images.
98
+ - **Visualization**: For scenes with ground truth data, you can use the `--gradio` flag to launch a Gradio interface that visualizes both the ground truth and the generated model. We provide data from the 3D-FUTURE test set as a demonstration.
99
+ - **Usage**:
100
+ ```sh
101
+ python inference.py --gradio
102
+ ```
103
+
104
+ ## ๐Ÿ“š Dataset
105
+ To be updated soon...
106
+
107
+ ## ๐Ÿ‹๏ธโ€โ™‚๏ธ Training
108
+ To be updated soon...
109
+
110
+ ## Evaluation
111
+ To be updated soon...
112
+
113
+ ## ๏ฟฝ๏ฟฝ Citation
114
+ If you use this code and data for your research or project, please cite:
115
+
116
+ @article{meng2025scenegen,
117
+ author = {Meng, Yanxu and Wu, Haoning and Zhang, Ya and Xie, Weidi},
118
+ title = {SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass},
119
+ journal = {arXiv preprint arXiv:2508.15769},
120
+ year = {2025},
121
+ }
122
+
123
+ ## TODO
124
+ - [x] Release Paper
125
+ - [x] Release Checkpoints & Inference Code
126
+ - [ ] Release Training Code
127
+ - [ ] Release Evaluation Code
128
+ - [ ] Release Data Processing Code
129
+
130
+ ## Acknowledgements
131
+ Many thanks to the code bases from [TRELLIS](https://github.com/microsoft/TRELLIS), [DINOv2](https://github.com/facebookresearch/dinov2), and [VGGT](https://github.com/facebookresearch/vggt).
132
+
133
+ ## Contact
134
+ If you have any questions, please feel free to contact [[email protected]](mailto:[email protected]) and [[email protected]](mailto:[email protected]).