π¨ AnySD
This is the official model of AnyEdit: Unified High-Quality Image Edit with Any Idea
π Summary
Since AnyEdit contains a wide range of editing instructions across various domains, it holds promising potential for developing a powerful editing model to address high-quality editing tasks. However, training such a model has three extra challenges: (a) aligning the semantics of various multi-modal inputs; (b) identifying the semantic edits within each domain to control the granularity and scope of the edits; (c) coordinating the complexity of various editing tasks to prevent catastrophic forgetting. To this end, we propose a novel AnyEdit Stable Diffusion approach (π¨AnySD) to cope with various editing tasks in the real world.
Architecture of π¨AnySD. π¨AnySD is a novel architecture that supports three conditions (original image, editing instruction, visual prompt) for various editing tasks.
π Our model is based on the awesome SD 1.5
π Inference
To run the model, you can refer to the code in , specifically
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='./' python3 anysd/infer.py
The script content is:
import os
from tqdm import tqdm
from anysd.src.model import AnySDPipeline, choose_expert
from anysd.train.valid_log import download_image
from anysd.src.utils import choose_book, get_experts_dir
if __name__ == "__main__":
expert_file_path = get_experts_dir(repo_id="WeiChow/AnySD")
book_dim, book = choose_book('all')
task_embs_checkpoints = expert_file_path + "task_embs.bin"
adapter_checkpoints = {
"global": expert_file_path + "global.bin",
"viewpoint": expert_file_path + "viewpoint.bin",
"visual_bbox": expert_file_path + "visual_bbox.bin",
"visual_depth": expert_file_path + "visual_dep.bin",
"visual_material_transfer": expert_file_path + "visual_mat.bin",
"visual_reference": expert_file_path + "visual_ref.bin",
"visual_scribble": expert_file_path + "visual_scr.bin",
"visual_segment": expert_file_path + "visual_seg.bin",
"visual_sketch": expert_file_path + "visual_ske.bin",
}
pipeline = AnySDPipeline(adapters_list=adapter_checkpoints, task_embs_checkpoints=task_embs_checkpoints)
os.makedirs('./assets/anysd-test/', exist_ok=True)
case = [
{
"edit": "Put on a pair of sunglasses",
"edit_type": 'general',
"image_file": "./assets/woman.jpg"
},
{
"edit": "Make her a wizard",
"edit_type": 'general',
"image_file": "./assets/woman.jpg"
}
]
for index, item in enumerate(tqdm(case)):
mode = choose_expert(mode=item["edit_type"])
if mode == 'general':
images = pipeline(
prompt=item['edit'],
original_image=download_image(item['image_file']),
guidance_scale=3,
num_inference_steps=100,
original_image_guidance_scale=3,
adapter_name="general",
)[0]
else:
images = pipeline(
prompt=item['edit'],
reference_image=download_image(item['refence_image_file']) if ('refence_image_file' in item.keys() and item['refence_image_file'] is not None) else None,
original_image=download_image(item['image_file']),
guidance_scale=1.5,
num_inference_steps=100,
original_image_guidance_scale=2,
reference_image_guidance_scale=0.8,
adapter_name=mode,
e_code=book[item["edit_type"]],
)[0]
images.save(f"./assets/anysd-test/{index}.jpg")
We sorted out the AnyEdit data when we released it to the public. To adapt the sorted model, we retrained the model, so the results will be slightly different from those in the paper, but the general results are similar. And the hyperparameters also have a greater impact on the results.
π Citation
@article{yu2024anyedit,
title={AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea},
author={Yu, Qifan and Chow, Wei and Yue, Zhongqi and Pan, Kaihang and Wu, Yang and Wan, Xiaoyang and Li, Juncheng and Tang, Siliang and Zhang, Hanwang and Zhuang, Yueting},
journal={arXiv preprint arXiv:2411.15738},
year={2024}
}
- Downloads last month
- 0