Text-to-Image
English
wala
text-to-depthmap
Hooman commited on
Commit
f863379
·
verified ·
1 Parent(s): 20c2285

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +107 -5
README.md CHANGED
@@ -1,5 +1,107 @@
1
- ---
2
- license: other
3
- license_name: autodesk-non-commercial-3d-generative-v1.0
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: other
5
+ license_name: autodesk-non-commercial-3d-generative-v1.0
6
+ tags:
7
+ - wala
8
+ - text-to-depthmap
9
+ ---
10
+
11
+ # Model Card for WaLa-MVDream-DM6
12
+
13
+ This model is part of the Wavelet Latent Diffusion (WaLa) paper, capable of generating six-view depth maps from text descriptions to support text-to-3D generation.
14
+
15
+ ## Model Details
16
+
17
+ ### Model Description
18
+
19
+ WaLa-MVDream-DM6 is a fine-tuned version of the MVDream model, adapted to generate six-view depth maps from text inputs. This model serves as an intermediate step in the text-to-3D generation pipeline of WaLa, producing multi-view depth maps that are then used by the WaLa-DM6-1B model to generate 3D shapes.
20
+
21
+ - **Developed by:** Aditya Sanghi, Aliasghar Khani, Chinthala Pradyumna Reddy, Arianna Rampini, Derek Cheung, Kamal Rahimi Malekshan, Kanika Madan, Hooman Shayani
22
+ - **Model type:** Text-to-Depth Map Generative Model
23
+ - **License:** Autodesk Non-Commercial (3D Generative) v1.0
24
+
25
+ For more information please look at the [Project](TBD) [Page](TBD) and [the paper](TBD).
26
+
27
+ ### Model Sources
28
+
29
+ - **Repository:** [Github](https://github.com/AutodeskAILab/WaLa)
30
+ - **Paper:** [ArXiv:TBD](TBD)
31
+ - **Demo:** [TBD](TBD)
32
+
33
+ ## Uses
34
+
35
+ ### Direct Use
36
+
37
+ This model is released by Autodesk and intended for academic and research purposes only for the theoretical exploration and demonstration of the WaLa 3D generative framework. It is designed to be used in conjunction with WaLa-DM6-1B for text-to-3D generation. Please see [here](TBD) for inferencing instructions.
38
+
39
+ ### Out-of-Scope Use
40
+
41
+ The model should not be used for:
42
+
43
+ - Commercial purposes
44
+ - Generation of inappropriate or offensive content
45
+ - Any usage not in compliance with the [license](https://huggingface.co/ADSKAILab/WaLa-MVDream-DM6/blob/main/LICENSE.md), in particular, the "Acceptable Use" section.
46
+
47
+ ## Bias, Risks, and Limitations
48
+
49
+ ### Bias
50
+
51
+ - The model may inherit biases present in the text-image datasets used for pre-training and fine-tuning.
52
+ - The model's performance may vary depending on the complexity and specificity of the input text descriptions.
53
+
54
+ ### Risks and Limitations
55
+
56
+ - The quality of the generated multi-view depth maps may impact the subsequent 3D shape generation.
57
+ - The model may occasionally generate depth maps that do not accurately represent the input text or maintain consistency across views.
58
+
59
+ ## How to Get Started with the Model
60
+
61
+ Please refer to the instructions [here](TBD)
62
+
63
+ ## Training Details
64
+
65
+ ### Training Data
66
+
67
+ The model was fine-tuned using captions generated for the WaLa dataset. Captions were initially created using the Internvl 2.0 model and then augmented using LLaMA 3.1 to enhance diversity and richness.
68
+
69
+ ### Training Procedure
70
+
71
+ #### Preprocessing
72
+
73
+ Captions were generated for each 3D object in the dataset using four renderings and two distinct prompts. These captions were then augmented to increase diversity. For depth map generation, six views were used to ensure comprehensive coverage of the entire object.
74
+
75
+ #### Training Hyperparameters
76
+
77
+ - **Training regime:** Please refer to the paper.
78
+
79
+ #### Speeds, Sizes, Times
80
+
81
+ [Information not provided in the paper]
82
+
83
+ ## Evaluation
84
+
85
+ ### Testing Data, Factors & Metrics
86
+
87
+ [Specific evaluation details for this model are not provided in the paper]
88
+
89
+ ### Results
90
+
91
+ [Specific results for this model are not provided in the paper]
92
+
93
+ ## Technical Specifications
94
+
95
+ ### Model Architecture and Objective
96
+
97
+ The model is based on the MVDream architecture, fine-tuned to generate six-view depth maps from text inputs. It is designed to work in tandem with the WaLa-DM6-1B model for text-to-3D generation. The model uses the Stable Diffusion framework, initialized with weights from MVDream, and is fine-tuned on depth map-text paired data.
98
+
99
+ ### Compute Infrastructure
100
+
101
+ #### Hardware
102
+
103
+ [TBD]
104
+
105
+ ## Citation
106
+
107
+ [Citation information to be added after paper publication]