Spaces:

prs-eth
/

marigold-normals

Running on Zero

App Files Files Community

toshas commited on Apr 4, 2024

Commit

b837595

1 Parent(s): 10ef4da

initial commit

Browse files

Files changed (9) hide show

.gitignore +5 -0
LICENSE.txt +177 -0
README.md +19 -7
app.py +295 -0
gradio_patches/examples.py +13 -0
gradio_patches/flagging.py +165 -0
marigold_normals_estimation.py +500 -0
requirements.txt +126 -0
requirements_min.txt +16 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,5 @@

+.idea
+.DS_Store
+__pycache__
+gradio_cached_examples
+Marigold

LICENSE.txt ADDED Viewed

	@@ -0,0 +1,177 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS

README.md CHANGED Viewed

@@ -1,13 +1,25 @@
 ---
-title: Marigold Normals
-emoji: 🌍
-colorFrom: yellow
 colorTo: purple
 sdk: gradio
-sdk_version: 4.25.0
 app_file: app.py
-pinned: false
-license: apache-2.0
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Marigold Normals Estimation
+emoji: 🏵️
+colorFrom: blue
 colorTo: purple
 sdk: gradio
+sdk_version: 4.21.0
 app_file: app.py
+pinned: true
+license: cc-by-sa-4.0
+hf_oauth: true
+hf_oauth_expiration_minutes: 43200
 ---
+This is a demo of Marigold, the state-of-the-art normals estimator for images in the wild.
+Find out more in our CVPR 2024 paper titled ["Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation"](https://arxiv.org/abs/2312.02145)
+```
+@InProceedings{ke2023repurposing,
+  title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation},
+  author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
+  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+  year={2024}
+}
+```

app.py ADDED Viewed

	@@ -0,0 +1,295 @@

+# Copyright 2024 Anton Obukhov, ETH Zurich. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# --------------------------------------------------------------------------
+# If you find this code useful, we kindly ask you to cite our paper in your work.
+# Please find bibtex at: https://github.com/prs-eth/Marigold#-citation
+# More information about the method can be found at https://marigoldmonodepth.github.io
+# --------------------------------------------------------------------------
+import functools
+import os
+import spaces
+import gradio as gr
+import numpy as np
+import torch as torch
+from PIL import Image
+from diffusers import UNet2DConditionModel
+from gradio_imageslider import ImageSlider
+from huggingface_hub import login
+from marigold_normals_estimation import MarigoldNormalsPipeline
+def process(
+    pipe,
+    path_input,
+    ensemble_size,
+    denoise_steps,
+    processing_res,
+):
+    input_image = Image.open(path_input)
+    pipe_out = pipe(
+        input_image,
+        ensemble_size=ensemble_size,
+        denoising_steps=denoise_steps,
+        processing_res=processing_res,
+        batch_size=1 if processing_res == 0 else 0,  # TODO: do we abuse "batch size" notation here?
+        show_progress_bar=True,
+    )
+    normals_pred = pipe_out.normals_np
+    normals_colored = pipe_out.normals_colored
+    path_output_dir = os.path.splitext(path_input)[0] + "_output"
+    os.makedirs(path_output_dir, exist_ok=True)
+    name_base = os.path.splitext(os.path.basename(path_input))[0]
+    path_out_fp32 = os.path.join(path_output_dir, f"{name_base}_normals_fp32.npy")
+    path_out_vis = os.path.join(path_output_dir, f"{name_base}_normals_colored.png")
+    np.save(path_out_fp32, normals_pred)
+    normals_colored.save(path_out_vis)
+    return (
+        [path_input, path_out_vis],  # TODO: should we unify and output rgb here in depth too?
+        [path_out_fp32, path_out_vis],  # TODO: reintroduce 16bit pngs if it supports 3 channels
+    )
+def run_demo_server(pipe):
+    process_pipe = spaces.GPU(functools.partial(process, pipe), duration=120)
+    os.environ["GRADIO_ALLOW_FLAGGING"] = "never"
+    with gr.Blocks(
+        analytics_enabled=False,
+        title="Marigold Normals Estimation",
+        css="""
+            #download {
+                height: 118px;
+            }
+            .slider .inner {
+                width: 5px;
+                background: #FFF;
+            }
+            .viewport {
+                aspect-ratio: 4/3;
+            }
+            h1 {
+                text-align: center;
+                display: block;
+            }
+            h2 {
+                text-align: center;
+                display: block;
+            }
+            h3 {
+                text-align: center;
+                display: block;
+            }
+        """,
+    ) as demo:
+        gr.Markdown(
+            """
+            # Marigold Normals Estimation
+            <p align="center">
+            <a title="Website" href="https://marigoldmonodepth.github.io/" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
+                <img src="https://www.obukhov.ai/img/badges/badge-website.svg">
+            </a>
+            <a title="arXiv" href="https://arxiv.org/abs/2312.02145" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
+                <img src="https://www.obukhov.ai/img/badges/badge-pdf.svg">
+            </a>
+            <a title="Github" href="https://github.com/prs-eth/marigold" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
+                <img src="https://img.shields.io/github/stars/prs-eth/marigold?label=GitHub%20%E2%98%85&logo=github&color=C8C" alt="badge-github-stars">
+            </a>
+            <a title="Social" href="https://twitter.com/antonobukhov1" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
+                <img src="https://www.obukhov.ai/img/badges/badge-social.svg" alt="social">
+            </a>
+            </p>
+        """
+        )
+        with gr.Row():
+            with gr.Column():
+                input_image = gr.Image(
+                    label="Input Image",
+                    type="filepath",
+                )
+                with gr.Accordion("Advanced options", open=True):
+                    ensemble_size = gr.Slider(
+                        label="Ensemble size",
+                        minimum=1,
+                        maximum=20,
+                        step=1,
+                        value=10,
+                    )
+                    denoise_steps = gr.Slider(
+                        label="Number of denoising steps",
+                        minimum=10,
+                        maximum=20,
+                        step=1,
+                        value=10,
+                    )
+                    processing_res = gr.Radio(
+                        [
+                            ("Native", 0),
+                            ("Recommended", 768),
+                        ],
+                        label="Processing resolution",
+                        value=768,
+                    )
+                with gr.Row():
+                    submit_btn = gr.Button(value="Compute Normals", variant="primary")
+                    clear_btn = gr.Button(value="Clear")
+            with gr.Column():
+                output_slider = ImageSlider(
+                    label="Predicted normals",
+                    type="filepath",
+                    show_download_button=True,
+                    show_share_button=True,
+                    interactive=False,
+                    elem_classes="slider",
+                    position=0.25,
+                )
+                files = gr.Files(
+                    label="Output files",
+                    elem_id="download",
+                    interactive=False,
+                )
+        blocks_settings = [ensemble_size, denoise_steps, processing_res]
+        map_id_to_default = {b._id: b.value for b in blocks_settings}
+        inputs = [
+            input_image,
+            ensemble_size,
+            denoise_steps,
+            processing_res,
+        ]
+        outputs = [
+            submit_btn,
+            input_image,
+            output_slider,
+            files,
+        ]
+        def submit_normals_fn(*args):
+            out = list(process_pipe(*args))
+            out = [gr.Button(interactive=False), gr.Image(interactive=False)] + out
+            return out
+        submit_btn.click(
+            fn=submit_normals_fn,
+            inputs=inputs,
+            outputs=outputs,
+            concurrency_limit=1,
+        )
+        gr.Examples(
+            fn=submit_normals_fn,
+            examples=[
+                [
+                    "files/bee.jpg",
+                    10,  # ensemble_size
+                    10,  # denoise_steps
+                    768,  # processing_res
+                ],
+                [
+                    "files/cat.jpg",
+                    10,  # ensemble_size
+                    10,  # denoise_steps
+                    768,  # processing_res
+                ],
+                [
+                    "files/swings.jpg",
+                    10,  # ensemble_size
+                    10,  # denoise_steps
+                    768,  # processing_res
+                ],
+                [
+                    "files/einstein.jpg",
+                    10,  # ensemble_size
+                    10,  # denoise_steps
+                    768,  # processing_res
+                ],
+            ],
+            inputs=inputs,
+            outputs=outputs,
+            cache_examples=False,
+        )
+        def clear_fn():
+            out = []
+            for b in blocks_settings:
+                out.append(map_id_to_default[b._id])
+            out += [
+                gr.Button(interactive=True),
+                gr.Image(value=None, interactive=True),
+                None, None, None,
+            ]
+            return out
+        clear_btn.click(
+            fn=clear_fn,
+            inputs=[],
+            outputs=blocks_settings + [
+                submit_btn,
+                input_image,
+                output_slider,
+                files,
+            ],
+        )
+        demo.queue(
+            api_open=False,
+        ).launch(
+            server_name="0.0.0.0",
+            server_port=7860,
+        )
+def main():
+    CHECKPOINT_DEPTH = "prs-eth/marigold-v1-0"
+    CHECKPOINT_NORMALS = "KevinQu7/marigold_normals"
+    if "HF_TOKEN_LOGIN" in os.environ:
+        login(token=os.environ["HF_TOKEN_LOGIN"])
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    pipe = MarigoldNormalsPipeline.from_pretrained(
+        CHECKPOINT_DEPTH,
+        unet=UNet2DConditionModel.from_pretrained(
+            CHECKPOINT_NORMALS,
+            subfolder='unet',
+            use_auth_token=True,
+        )
+    )
+    try:
+        import xformers
+        pipe.enable_xformers_memory_efficient_attention()
+    except:
+        pass  # run without xformers
+    pipe = pipe.to(device)
+    run_demo_server(pipe)
+if __name__ == "__main__":
+    main()

gradio_patches/examples.py ADDED Viewed

	@@ -0,0 +1,13 @@

+from pathlib import Path
+import gradio
+from gradio.utils import get_cache_folder
+class Examples(gradio.helpers.Examples):
+    def __init__(self, *args, directory_name=None, **kwargs):
+        super().__init__(*args, **kwargs, _initiated_directly=False)
+        if directory_name is not None:
+            self.cached_folder = get_cache_folder() / directory_name
+            self.cached_file = Path(self.cached_folder) / "log.csv"
+        self.create()

gradio_patches/flagging.py ADDED Viewed

	@@ -0,0 +1,165 @@

+from __future__ import annotations
+import datetime
+import json
+import time
+import uuid
+from collections import OrderedDict
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any
+import gradio
+import gradio as gr
+import huggingface_hub
+from gradio import FlaggingCallback
+from gradio_client import utils as client_utils
+class HuggingFaceDatasetSaver(gradio.HuggingFaceDatasetSaver):
+    def flag(
+        self,
+        flag_data: list[Any],
+        flag_option: str = "",
+        username: str | None = None,
+    ) -> int:
+        if self.separate_dirs:
+            # JSONL files to support dataset preview on the Hub
+            current_utc_time = datetime.now(timezone.utc)
+            iso_format_without_microseconds = current_utc_time.strftime(
+                "%Y-%m-%dT%H:%M:%S"
+            )
+            milliseconds = int(current_utc_time.microsecond / 1000)
+            unique_id = f"{iso_format_without_microseconds}.{milliseconds:03}Z"
+            if username not in (None, ""):
+                unique_id += f"_U_{username}"
+            else:
+                unique_id += f"_{str(uuid.uuid4())[:8]}"
+            components_dir = self.dataset_dir / unique_id
+            data_file = components_dir / "metadata.jsonl"
+            path_in_repo = unique_id  # upload in sub folder (safer for concurrency)
+        else:
+            # Unique CSV file
+            components_dir = self.dataset_dir
+            data_file = components_dir / "data.csv"
+            path_in_repo = None  # upload at root level
+        return self._flag_in_dir(
+            data_file=data_file,
+            components_dir=components_dir,
+            path_in_repo=path_in_repo,
+            flag_data=flag_data,
+            flag_option=flag_option,
+            username=username or "",
+        )
+    def _deserialize_components(
+        self,
+        data_dir: Path,
+        flag_data: list[Any],
+        flag_option: str = "",
+        username: str = "",
+    ) -> tuple[dict[Any, Any], list[Any]]:
+        """Deserialize components and return the corresponding row for the flagged sample.
+        Images/audio are saved to disk as individual files.
+        """
+        # Components that can have a preview on dataset repos
+        file_preview_types = {gr.Audio: "Audio", gr.Image: "Image"}
+        # Generate the row corresponding to the flagged sample
+        features = OrderedDict()
+        row = []
+        for component, sample in zip(self.components, flag_data):
+            # Get deserialized object (will save sample to disk if applicable -file, audio, image,...-)
+            label = component.label or ""
+            save_dir = data_dir / client_utils.strip_invalid_filename_characters(label)
+            save_dir.mkdir(exist_ok=True, parents=True)
+            deserialized = component.flag(sample, save_dir)
+            # Base component .flag method returns JSON; extract path from it when it is FileData
+            if component.data_model:
+                data = component.data_model.from_json(json.loads(deserialized))
+                if component.data_model == gr.data_classes.FileData:
+                    deserialized = data.path
+            # Add deserialized object to row
+            features[label] = {"dtype": "string", "_type": "Value"}
+            try:
+                deserialized_path = Path(deserialized)
+                if not deserialized_path.exists():
+                    raise FileNotFoundError(f"File {deserialized} not found")
+                row.append(str(deserialized_path.relative_to(self.dataset_dir)))
+            except (FileNotFoundError, TypeError, ValueError):
+                deserialized = "" if deserialized is None else str(deserialized)
+                row.append(deserialized)
+            # If component is eligible for a preview, add the URL of the file
+            # Be mindful that images and audio can be None
+            if isinstance(component, tuple(file_preview_types)):  # type: ignore
+                for _component, _type in file_preview_types.items():
+                    if isinstance(component, _component):
+                        features[label + " file"] = {"_type": _type}
+                        break
+                if deserialized:
+                    path_in_repo = str(  # returned filepath is absolute, we want it relative to compute URL
+                        Path(deserialized).relative_to(self.dataset_dir)
+                    ).replace(
+                        "\\", "/"
+                    )
+                    row.append(
+                        huggingface_hub.hf_hub_url(
+                            repo_id=self.dataset_id,
+                            filename=path_in_repo,
+                            repo_type="dataset",
+                        )
+                    )
+                else:
+                    row.append("")
+        features["flag"] = {"dtype": "string", "_type": "Value"}
+        features["username"] = {"dtype": "string", "_type": "Value"}
+        row.append(flag_option)
+        row.append(username)
+        return features, row
+class FlagMethod:
+    """
+    Helper class that contains the flagging options and calls the flagging method. Also
+    provides visual feedback to the user when flag is clicked.
+    """
+    def __init__(
+        self,
+        flagging_callback: FlaggingCallback,
+        label: str,
+        value: str,
+        visual_feedback: bool = True,
+    ):
+        self.flagging_callback = flagging_callback
+        self.label = label
+        self.value = value
+        self.__name__ = "Flag"
+        self.visual_feedback = visual_feedback
+    def __call__(
+        self,
+        request: gr.Request,
+        profile: gr.OAuthProfile | None,
+        *flag_data,
+    ):
+        username = None
+        if profile is not None:
+            username = profile.username
+        try:
+            self.flagging_callback.flag(
+                list(flag_data), flag_option=self.value, username=username
+            )
+        except Exception as e:
+            print(f"Error while sharing: {e}")
+            if self.visual_feedback:
+                return gr.Button(value="Sharing error", interactive=False)
+        if not self.visual_feedback:
+            return
+        time.sleep(0.8)  # to provide enough time for the user to observe button change
+        return gr.Button(value="Sharing complete", interactive=False)

marigold_normals_estimation.py ADDED Viewed

	@@ -0,0 +1,500 @@

+# Copyright 2024 Bingxin Ke, Anton Obukhov, ETH Zurich and The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# --------------------------------------------------------------------------
+# If you find this code useful, we kindly ask you to cite our paper in your work.
+# Please find bibtex at: https://github.com/prs-eth/Marigold#-citation
+# More information about the method can be found at https://marigoldmonodepth.github.io
+# --------------------------------------------------------------------------
+import math
+from typing import Dict, Union
+import numpy as np
+import torch
+from PIL import Image
+from torch.utils.data import DataLoader, TensorDataset
+from tqdm.auto import tqdm
+from transformers import CLIPTextModel, CLIPTokenizer
+from diffusers import (
+    AutoencoderKL,
+    DDIMScheduler,
+    DiffusionPipeline,
+    UNet2DConditionModel,
+)
+from diffusers.utils import BaseOutput, check_min_version
+# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
+check_min_version("0.27.0.dev0")
+class MarigoldNormalsOutput(BaseOutput):
+    """
+    Output class for Marigold monocular normals prediction pipeline.
+    Args:
+        normals_np (`np.ndarray`):
+            Predicted normals map, with normals values in the range of [0, 1].
+        normals_colored (`None` or `PIL.Image.Image`):
+            Colorized normals map, with the shape of [3, H, W] and values in [0, 1].
+        normals_uncertainty (`None` or `np.ndarray`):
+            Uncalibrated uncertainty(MAD, median absolute deviation) coming from ensembling.
+    """
+    normals_np: np.ndarray
+    normals_colored: Union[None, Image.Image]
+    normals_uncertainty: Union[None, np.ndarray]
+class MarigoldNormalsPipeline(DiffusionPipeline):
+    """
+    Pipeline for monocular normals estimation using Marigold: https://marigoldmonodepth.github.io.
+    This model inherits from [`DiffusionPipeline`]. Check the superclass documentation for the generic methods the
+    library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)
+    Args:
+        unet (`UNet2DConditionModel`):
+            Conditional U-Net to denoise the normals latent, conditioned on image latent.
+        vae (`AutoencoderKL`):
+            Variational Auto-Encoder (VAE) Model to encode and decode images and normals maps
+            to and from latent representations.
+        scheduler (`DDIMScheduler`):
+            A scheduler to be used in combination with `unet` to denoise the encoded image latents.
+        text_encoder (`CLIPTextModel`):
+            Text-encoder, for empty text embedding.
+        tokenizer (`CLIPTokenizer`):
+            CLIP tokenizer.
+    """
+    latent_scale_factor = 0.18215
+    def __init__(
+        self,
+        unet: UNet2DConditionModel,
+        vae: AutoencoderKL,
+        scheduler: DDIMScheduler,
+        text_encoder: CLIPTextModel,
+        tokenizer: CLIPTokenizer,
+    ):
+        super().__init__()
+        self.register_modules(
+            unet=unet,
+            vae=vae,
+            scheduler=scheduler,
+            text_encoder=text_encoder,
+            tokenizer=tokenizer,
+        )
+        self.empty_text_embed = None
+    @torch.no_grad()
+    def __call__(
+        self,
+        input_image: Image,
+        denoising_steps: int = 10,
+        ensemble_size: int = 10,
+        processing_res: int = 768,
+        match_input_res: bool = True,
+        batch_size: int = 0,
+        save_memory: bool = False,
+        color_map: str = "Spectral",  # TODO change colorization api based on modality
+        show_progress_bar: bool = True,
+        ensemble_kwargs: Dict = None,
+    ) -> MarigoldNormalsOutput:
+        """
+        Function invoked when calling the pipeline.
+        Args:
+            input_image (`Image`):
+                Input RGB (or gray-scale) image.
+            processing_res (`int`, *optional*, defaults to `768`):
+                Maximum resolution of processing.
+                If set to 0: will not resize at all.
+            match_input_res (`bool`, *optional*, defaults to `True`):
+                Resize normals prediction to match input resolution.
+                Only valid if `limit_input_res` is not None.
+            denoising_steps (`int`, *optional*, defaults to `10`):
+                Number of diffusion denoising steps (DDIM) during inference.
+            ensemble_size (`int`, *optional*, defaults to `10`):
+                Number of predictions to be ensembled.
+            batch_size (`int`, *optional*, defaults to `0`):
+                Inference batch size, no bigger than `num_ensemble`.
+                If set to 0, the script will automatically decide the proper batch size.
+            save_memory (`bool`, defaults to `False`):
+                Extra steps to save memory at the cost of perforance.
+            show_progress_bar (`bool`, *optional*, defaults to `True`):
+                Display a progress bar of diffusion denoising.
+            color_map (`str`, *optional*, defaults to `"Spectral"`, pass `None` to skip colorized normals map generation):
+                Colormap used to colorize the normals map.
+            ensemble_kwargs (`dict`, *optional*, defaults to `None`):
+                Arguments for detailed ensembling settings.
+        Returns:
+            `MarigoldNormalsOutput`: Output class for Marigold monocular normals prediction pipeline, including:
+            - **normals_np** (`np.ndarray`) Predicted normals map, with normals values in the range of [-1, 1]
+            - **normals_colored** (`None` or `PIL.Image.Image`) Colorized normals map, with the shape of [3, H, W] and
+                    values in [0, 1]. None if `color_map` is `None`
+            - **normals_uncertainty** (`None` or `np.ndarray`) Uncalibrated uncertainty(MAD, median absolute deviation)
+                    coming from ensembling. None if `ensemble_size = 1`
+        """
+        if not match_input_res:
+            assert processing_res is not None
+        assert processing_res >= 0
+        assert denoising_steps >= 1
+        assert ensemble_size >= 1
+        W, H = input_image.size
+        if processing_res > 0:
+            input_image = self.resize_max_res(
+                input_image, max_edge_resolution=processing_res
+            )
+        input_image = input_image.convert("RGB")
+        image = np.asarray(input_image)
+        rgb = np.transpose(image, (2, 0, 1))  # [H, W, rgb] -> [rgb, H, W]
+        rgb_norm = rgb / 255.0 * 2.0 - 1.0  #  [0, 255] -> [-1, 1]
+        rgb_norm = torch.from_numpy(rgb_norm).to(self.dtype)
+        rgb_norm = rgb_norm.to(self.device)
+        assert rgb_norm.min() >= -1.0 and rgb_norm.max() <= 1.0
+        duplicated_rgb = torch.stack([rgb_norm] * ensemble_size)
+        single_rgb_dataset = TensorDataset(duplicated_rgb)
+        if batch_size > 0:
+            _bs = batch_size
+        else:
+            _bs = self._find_batch_size(
+                ensemble_size=ensemble_size,
+                input_res=max(rgb_norm.shape[1:]),
+                dtype=self.dtype,
+            )
+        single_rgb_loader = DataLoader(
+            single_rgb_dataset, batch_size=_bs, shuffle=False
+        )
+        pred = []
+        if show_progress_bar:
+            iterable = tqdm(
+                single_rgb_loader, desc=" " * 2 + "Inference batches", leave=False
+            )
+        else:
+            iterable = single_rgb_loader
+        for batch in iterable:
+            (batched_img,) = batch
+            pred_raw = self.single_infer(
+                rgb_in=batched_img,
+                num_inference_steps=denoising_steps,
+                show_pbar=show_progress_bar,
+            )
+            pred_raw = pred_raw.detach()
+            if save_memory:
+                pred_raw = pred_raw.cpu()
+            pred.append(pred_raw)
+        pred = torch.concat(pred, dim=0)  # [B,3,H,W]
+        pred_uncert = None
+        if save_memory:
+            torch.cuda.empty_cache()
+        if ensemble_size > 1:
+            pred, pred_uncert = self.ensemble_normals(
+                pred, **(ensemble_kwargs or {})
+            )  # [1,3,H,W], [1,H,W]
+        if match_input_res:
+            pred = torch.nn.functional.interpolate(
+                pred, (H, W), mode="bilinear"
+            )  # [1,3,H,W]
+            norm = torch.norm(pred, dim=1, keepdim=True)  # [1,1,H,W]
+            pred /= norm.clamp(min=1e-6)
+            if pred_uncert is not None:
+                pred_uncert = torch.nn.functional.interpolate(
+                    pred_uncert.unsqueeze(1), (H, W), mode="bilinear"
+                ).squeeze(
+                    1
+                )  # [1,H,W]
+        # TODO: make X-axis of normals configurable through abstraction
+        if color_map is not None:
+            colored = (pred.squeeze(0) + 1.0) * 0.5
+            colored = (colored * 255).to(torch.uint8)
+            colored = self.chw2hwc(colored).cpu().numpy()
+            colored_img = Image.fromarray(colored)
+        else:
+            colored_img = None
+        if pred_uncert is not None:
+            pred_uncert = pred_uncert.cpu().numpy()
+        pred = pred.cpu().numpy()  # TODO: np or torch?
+        out = MarigoldNormalsOutput(
+            normals_np=pred,
+            normals_colored=colored_img,
+            normals_uncertainty=pred_uncert,
+        )
+        return out
+    def _encode_empty_text(self):
+        """
+        Encode text embedding for empty prompt.
+        """
+        prompt = ""
+        text_inputs = self.tokenizer(
+            prompt,
+            padding="do_not_pad",
+            max_length=self.tokenizer.model_max_length,
+            truncation=True,
+            return_tensors="pt",
+        )
+        text_input_ids = text_inputs.input_ids.to(self.text_encoder.device)
+        self.empty_text_embed = self.text_encoder(text_input_ids)[0].to(self.dtype)
+    @torch.no_grad()
+    def single_infer(
+        self, rgb_in: torch.Tensor, num_inference_steps: int, show_pbar: bool
+    ) -> torch.Tensor:
+        """
+        Perform an individual normals prediction without ensembling.
+        Args:
+            rgb_in (`torch.Tensor`):
+                Input RGB image.
+            num_inference_steps (`int`):
+                Number of diffusion denoisign steps (DDIM) during inference.
+            show_pbar (`bool`):
+                Display a progress bar of diffusion denoising.
+        Returns:
+            `torch.Tensor`: Predicted normals map.
+        """
+        device = rgb_in.device
+        # Set timesteps
+        self.scheduler.set_timesteps(num_inference_steps, device=device)
+        timesteps = self.scheduler.timesteps  # [T]
+        # Encode image
+        rgb_latent = self._encode_rgb(rgb_in)
+        # Initialize prediction latent with noise
+        pred_latent = torch.randn(
+            rgb_latent.shape, device=device, dtype=self.dtype
+        )  # [B, 4, h, w]
+        # Batched empty text embedding
+        if self.empty_text_embed is None:
+            self._encode_empty_text()
+        batch_empty_text_embed = self.empty_text_embed.repeat(
+            (rgb_latent.shape[0], 1, 1)
+        )  # [B, 2, 1024]
+        # Denoising loop
+        if show_pbar:
+            iterable = tqdm(
+                enumerate(timesteps),
+                total=len(timesteps),
+                leave=False,
+                desc=" " * 4 + "Diffusion denoising",
+            )
+        else:
+            iterable = enumerate(timesteps)
+        for i, t in iterable:
+            unet_input = torch.cat(
+                [rgb_latent, pred_latent], dim=1
+            )  # this order is important
+            # predict the noise residual
+            noise_pred = self.unet(
+                unet_input, t, encoder_hidden_states=batch_empty_text_embed
+            ).sample  # [B, 4, h, w]
+            # compute the previous noisy sample x_t -> x_t-1
+            pred_latent = self.scheduler.step(noise_pred, t, pred_latent).prev_sample
+        # torch.cuda.empty_cache()  # TODO is it really needed here, even if memory saving?
+        pred_pixels = self._decode_pred(pred_latent)  # [B, 3, H, W]
+        return pred_pixels
+    def _encode_rgb(self, rgb_in: torch.Tensor) -> torch.Tensor:
+        """
+        Encode RGB image into latent.
+        Args:
+            rgb_in (`torch.Tensor`):
+                Input RGB image to be encoded.
+        Returns:
+            `torch.Tensor`: Image latent.
+        """
+        # encode
+        h = self.vae.encoder(rgb_in)
+        moments = self.vae.quant_conv(h)
+        mean, logvar = torch.chunk(moments, 2, dim=1)
+        # scale latent
+        rgb_latent = mean * self.latent_scale_factor
+        return rgb_latent
+    def _decode_pred(self, latent: torch.Tensor) -> torch.Tensor:
+        """
+        Decode normals latent into normals map.
+        Args:
+            latent (`torch.Tensor`):
+                Prediction latent to be decoded [B, 4, h, w].
+        Returns:
+            `torch.Tensor`: Decoded prediction map [B, 3, H, W].
+        """
+        # decode latent
+        latent = latent / self.latent_scale_factor
+        latent = self.vae.post_quant_conv(latent)
+        pixels = self.vae.decoder(latent)
+        # clip prediction
+        pixels = torch.clip(pixels, -1.0, 1.0)
+        # renormalize prediction
+        norm = torch.norm(pixels, dim=1, keepdim=True)
+        pixels = pixels / norm.clamp(min=1e-6)
+        return pixels
+    @staticmethod
+    def resize_max_res(img: Image.Image, max_edge_resolution: int) -> Image.Image:
+        """
+        Resize image to limit maximum edge length while keeping aspect ratio.
+        Args:
+            img (`Image.Image`):
+                Image to be resized.
+            max_edge_resolution (`int`):
+                Maximum edge length (pixel).
+        Returns:
+            `Image.Image`: Resized image.
+        """
+        original_width, original_height = img.size
+        downscale_factor = min(
+            max_edge_resolution / original_width, max_edge_resolution / original_height
+        )
+        new_width = int(original_width * downscale_factor)
+        new_height = int(original_height * downscale_factor)
+        resized_img = img.resize((new_width, new_height))
+        return resized_img
+    @staticmethod
+    def chw2hwc(chw):
+        assert 3 == len(chw.shape)
+        if isinstance(chw, torch.Tensor):
+            hwc = torch.permute(chw, (1, 2, 0))
+        elif isinstance(chw, np.ndarray):
+            hwc = np.moveaxis(chw, 0, -1)
+        return hwc
+    @staticmethod
+    def _find_batch_size(ensemble_size: int, input_res: int, dtype: torch.dtype) -> int:
+        """
+        Automatically search for suitable operating batch size.
+        Args:
+            ensemble_size (`int`):
+                Number of predictions to be ensembled.
+            input_res (`int`):
+                Operating resolution of the input image.
+        Returns:
+            `int`: Operating batch size.
+        """
+        # Search table for suggested max. inference batch size
+        bs_search_table = [
+            # tested on A100-PCIE-80GB
+            {"res": 768, "total_vram": 79, "bs": 35, "dtype": torch.float32},
+            {"res": 1024, "total_vram": 79, "bs": 20, "dtype": torch.float32},
+            # tested on A100-PCIE-40GB
+            {"res": 768, "total_vram": 39, "bs": 15, "dtype": torch.float32},
+            {"res": 1024, "total_vram": 39, "bs": 8, "dtype": torch.float32},
+            {"res": 768, "total_vram": 39, "bs": 30, "dtype": torch.float16},
+            {"res": 1024, "total_vram": 39, "bs": 15, "dtype": torch.float16},
+            # tested on RTX3090, RTX4090
+            {"res": 512, "total_vram": 23, "bs": 20, "dtype": torch.float32},
+            {"res": 768, "total_vram": 23, "bs": 7, "dtype": torch.float32},
+            {"res": 1024, "total_vram": 23, "bs": 3, "dtype": torch.float32},
+            {"res": 512, "total_vram": 23, "bs": 40, "dtype": torch.float16},
+            {"res": 768, "total_vram": 23, "bs": 18, "dtype": torch.float16},
+            {"res": 1024, "total_vram": 23, "bs": 10, "dtype": torch.float16},
+            # tested on GTX1080Ti
+            {"res": 512, "total_vram": 10, "bs": 5, "dtype": torch.float32},
+            {"res": 768, "total_vram": 10, "bs": 2, "dtype": torch.float32},
+            {"res": 512, "total_vram": 10, "bs": 10, "dtype": torch.float16},
+            {"res": 768, "total_vram": 10, "bs": 5, "dtype": torch.float16},
+            {"res": 1024, "total_vram": 10, "bs": 3, "dtype": torch.float16},
+        ]
+        if not torch.cuda.is_available():
+            return 1
+        total_vram = torch.cuda.mem_get_info()[1] / 1024.0**3
+        filtered_bs_search_table = [s for s in bs_search_table if s["dtype"] == dtype]
+        for settings in sorted(
+            filtered_bs_search_table,
+            key=lambda k: (k["res"], -k["total_vram"]),
+        ):
+            if input_res <= settings["res"] and total_vram >= settings["total_vram"]:
+                bs = settings["bs"]
+                if bs > ensemble_size:
+                    bs = ensemble_size
+                elif bs > math.ceil(ensemble_size / 2) and bs < ensemble_size:
+                    bs = math.ceil(ensemble_size / 2)
+                return bs
+        return 1
+    @staticmethod
+    def ensemble_normals(pred_normals: torch.Tensor, reduction: str = "median"):
+        assert reduction in ("median", "mean")
+        B, C, H, W = pred_normals.shape
+        assert C == 3
+        mean_normals = pred_normals.mean(dim=0, keepdim=True)  # [1,3,H,W]
+        mean_normals_norm = mean_normals.norm(dim=1, keepdim=True)  # [1,1,H,W]
+        mean_normals /= mean_normals_norm.clip(min=1e-6)  # [1,3,H,W]
+        sim_cos = (mean_normals * pred_normals).sum(dim=1)  # [B,H,W]
+        sim_acos = sim_cos.arccos()  # [B,H,W]
+        sim_acos = sim_acos.mean(dim=0, keepdim=True) / math.pi  # [1,H,W]
+        if reduction == "mean":
+            return mean_normals, sim_acos  # [1,3,H,W], [1,H,W]
+        # Find the index of the closest normal vector for each pixel
+        closest_indices = sim_cos.argmax(dim=0, keepdim=True)  # [1,H,W]
+        closest_indices = closest_indices.unsqueeze(0).repeat(1, 3, 1, 1)  # [1,3,H,W]
+        closest_normals = torch.gather(pred_normals, 0, closest_indices)
+        return closest_normals, sim_acos  # [1,3,H,W], [1,H,W]

requirements.txt ADDED Viewed

	@@ -0,0 +1,126 @@

+accelerate==0.25.0
+aiofiles==23.2.1
+aiohttp==3.9.3
+aiosignal==1.3.1
+altair==5.3.0
+annotated-types==0.6.0
+anyio==4.3.0
+async-timeout==4.0.3
+attrs==23.2.0
+Authlib==1.3.0
+certifi==2024.2.2
+cffi==1.16.0
+charset-normalizer==3.3.2
+click==8.0.4
+cmake==3.29.0.1
+contourpy==1.2.0
+cryptography==42.0.5
+cycler==0.12.1
+dataclasses-json==0.6.4
+datasets==2.18.0
+Deprecated==1.2.14
+diffusers==0.27.2
+dill==0.3.8
+exceptiongroup==1.2.0
+fastapi==0.110.0
+ffmpy==0.3.2
+filelock==3.13.3
+fonttools==4.50.0
+frozenlist==1.4.1
+fsspec==2024.2.0
+gradio==4.21.0
+gradio_client==0.12.0
+gradio_imageslider==0.0.18
+h11==0.14.0
+httpcore==1.0.5
+httpx==0.27.0
+huggingface-hub==0.22.1
+idna==3.6
+imageio==2.34.0
+imageio-ffmpeg==0.4.9
+importlib_metadata==7.1.0
+importlib_resources==6.4.0
+itsdangerous==2.1.2
+Jinja2==3.1.3
+jsonschema==4.21.1
+jsonschema-specifications==2023.12.1
+kiwisolver==1.4.5
+lit==18.1.2
+markdown-it-py==3.0.0
+MarkupSafe==2.1.5
+marshmallow==3.21.1
+matplotlib==3.8.2
+mdurl==0.1.2
+mpmath==1.3.0
+multidict==6.0.5
+multiprocess==0.70.16
+mypy-extensions==1.0.0
+networkx==3.2.1
+numpy==1.26.4
+nvidia-cublas-cu11==11.10.3.66
+nvidia-cuda-cupti-cu11==11.7.101
+nvidia-cuda-nvrtc-cu11==11.7.99
+nvidia-cuda-runtime-cu11==11.7.99
+nvidia-cudnn-cu11==8.5.0.96
+nvidia-cufft-cu11==10.9.0.58
+nvidia-curand-cu11==10.2.10.91
+nvidia-cusolver-cu11==11.4.0.1
+nvidia-cusparse-cu11==11.7.4.91
+nvidia-nccl-cu11==2.14.3
+nvidia-nvtx-cu11==11.7.91
+orjson==3.10.0
+packaging==24.0
+pandas==2.2.1
+pillow==10.2.0
+protobuf==3.20.3
+psutil==5.9.8
+pyarrow==15.0.2
+pyarrow-hotfix==0.6
+pycparser==2.22
+pydantic==2.6.4
+pydantic_core==2.16.3
+pydub==0.25.1
+pygltflib==1.16.1
+Pygments==2.17.2
+pyparsing==3.1.2
+python-dateutil==2.9.0.post0
+python-multipart==0.0.9
+pytz==2024.1
+PyYAML==6.0.1
+referencing==0.34.0
+regex==2023.12.25
+requests==2.31.0
+rich==13.7.1
+rpds-py==0.18.0
+ruff==0.3.4
+safetensors==0.4.2
+scipy==1.11.4
+semantic-version==2.10.0
+shellingham==1.5.4
+six==1.16.0
+sniffio==1.3.1
+spaces==0.25.0
+starlette==0.36.3
+sympy==1.12
+tokenizers==0.15.2
+tomlkit==0.12.0
+toolz==0.12.1
+torch==2.0.1
+tqdm==4.66.2
+transformers==4.36.1
+trimesh==4.0.5
+triton==2.0.0
+typer==0.12.0
+typer-cli==0.12.0
+typer-slim==0.12.0
+typing-inspect==0.9.0
+typing_extensions==4.10.0
+tzdata==2024.1
+urllib3==2.2.1
+uvicorn==0.29.0
+websockets==11.0.3
+wrapt==1.16.0
+xformers==0.0.21
+xxhash==3.4.1
+yarl==1.9.4
+zipp==3.18.1

requirements_min.txt ADDED Viewed

	@@ -0,0 +1,16 @@

+gradio==4.21.0
+gradio-imageslider==0.0.18
+pygltflib==1.16.1
+trimesh==4.0.5
+imageio
+imageio-ffmpeg
+Pillow
+spaces==0.25.0
+accelerate==0.25.0
+diffusers==0.27.2
+matplotlib==3.8.2
+scipy==1.11.4
+torch==2.0.1
+transformers==4.36.1
+xformers==0.0.21