PromptEnhancer-Img2img-Edit / README.md

xingxm

Update README.md

00e494d verified 29 days ago

preview code

raw

history blame contribute delete

2.94 kB

metadata

license: other
license_name: apache-2.0
license_link: https://huggingface.co/PromptEnhancer/PromptEnhancer-32B/blob/main/License.txt
language:
  - zh
  - en
tags:
  - text-to-image
  - prompt-enhancement
  - prompt-rewriting
  - chain-of-thought
pipeline_tag: text-generation
library_name: transformers
base_model: Qwen/Qwen2.5-VL-32B-Instruct

PromptEnhancerV2 (32B) - Img2Img Edit

PromptEnhancerV2 is a multimodal language model fine-tuned for image-to-image editing instruction enhancement and rewriting. It refines editing instructions by leveraging both the input text and the provided image, preserving the original intent while producing clearer, structured, and logically consistent prompts suitable for downstream image editing tasks.

Model Details

Model Description

PromptEnhancerV2 (Img2Img Edit) is a specialized vision-language prompt rewriting model that employs chain-of-thought reasoning to enhance user editing instructions with visual context.

Model type: Vision-Language Model for Prompt Enhancement
Language(s) (NLP): Chinese (zh), English (en)
License: Apache-2.0
Finetuned from model: Qwen/Qwen2.5-VL-32B-Instruct

Model Sources

Repository: https://github.com/ximinng/PromptEnhancer
Paper: https://arxiv.org/abs/2509.04545
Homepage: https://hunyuan-promptenhancer.github.io/

How to Get Started with the Model

1. Clone the repository::

git clone https://github.com/ximinng/PromptEnhancer.git
cd PromptEnhancer
pip install -r requirements.txt

2. Model Download:

huggingface-cli download PromptEnhancer/PromptEnhancer-Img2img-Edit --local-dir ./models/promptenhancer-img2img-edit

3. Use the model:

from inference.prompt_enhancer_img2img import PromptEnhancerImg2Img

# Initialize the model
models_root_path = "./models/promptenhancer-img2img-edit"
enhancer = PromptEnhancerImg2Img(model_path=models_root_path, device_map="auto")

# Enhance an editing instruction with image context (Chinese or English)
edit_instruction = "去掉图片底部的水印，保留主体不变"
image_path = "./examples/sample_image.png"

enhanced_prompt = enhancer.predict(
    edit_instruction=edit_instruction,
    image_path=image_path,
    temperature=0.1,
    top_p=0.9,
    max_new_tokens=2048
)

print("Enhanced:", enhanced_prompt)

Citation

If you find this model useful, please consider citing:

BibTeX:

@article{promptenhancer,
  title={PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting},
  author={Wang, Linqing and Xing, Ximing and Cheng, Yiji and Zhao, Zhiyuan and Donghao, Li and Tiankai, Hang and Zhenxi, Li and Tao, Jiale and Wang, QiXun and Li, Ruihuang and Chen, Comi and Li, Xin and Wu, Mingrui and Deng, Xinchi and Gu, Shuyang and Wang, Chunyu and Lu, Qinglin},
  journal={arXiv preprint arXiv:2509.04545},
  year={2025}
}