Spaces:

fffiloni
/

text-guided-image-colorization

Running on Zero

App Files Files Community

fffiloni commited on about 1 month ago

Commit

1d4b889

verified ·

1 Parent(s): 6766d4a

add docstrings explanation for MCP server mode

Browse files

Files changed (1) hide show

gradio_ui.py +29 -1

gradio_ui.py CHANGED Viewed

@@ -136,6 +136,34 @@ def process_image(image_path: str,
                   positive_prompt: Optional[str],
                   negative_prompt: Optional[str],
                   seed: int) -> tuple[PIL.Image.Image, str]:
     torch.manual_seed(seed)
     image = PIL.Image.open(image_path)
@@ -189,7 +217,7 @@ def create_interface():
 def main():
     interface = create_interface()
-    interface.launch(ssr_mode=False)
 if __name__ == "__main__":

                   positive_prompt: Optional[str],
                   negative_prompt: Optional[str],
                   seed: int) -> tuple[PIL.Image.Image, str]:
+    """Colorize a grayscale or low-color image using automatic captioning and text-guided diffusion.
+    This function performs image-to-image generation using a ControlNet model and Stable Diffusion XL,
+    guided by a text caption extracted from the image itself using a BLIP captioning model. Optional
+    prompts (positive and negative) can further influence the output style or content.
+    Process Overview:
+        1. The input image is loaded and resized to 512x512 for inference.
+        2. A BLIP model generates a caption describing the image content.
+        3. The caption is cleaned using a filtering function to remove misleading or unwanted terms.
+        4. A prompt is constructed by combining the user-provided positive prompt with the caption.
+        5. A ControlNet-guided image is generated using the SDXL pipeline.
+        6. The output image's color channels (A and B in LAB space) are applied to the original luminance (L)
+           of the control image to preserve structure while transferring color.
+        7. The image is resized back to the original resolution and returned.
+    Args:
+        image_path: Path to the grayscale or lightly colored input image (JPEG/PNG).
+        positive_prompt: Additional descriptive text to enhance or guide the generation.
+        negative_prompt: Words or phrases to avoid during generation (e.g., "blurry", "monochrome").
+        seed: Random seed for reproducible generation.
+    Returns:
+        A tuple containing:
+            - A colorized PIL image based on the input and generated caption.
+            - The cleaned caption string used to guide the generation.
+    """
     torch.manual_seed(seed)
     image = PIL.Image.open(image_path)
 def main():
     interface = create_interface()
+    interface.launch(ssr_mode=False, mcp_server=True)
 if __name__ == "__main__":