Text-to-Image
Diffusers
Safetensors
xwwshen commited on
Commit
ce67d4c
·
verified ·
1 Parent(s): f4f46c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -6
README.md CHANGED
@@ -40,15 +40,25 @@ pipeline_tag: text-to-image
40
 
41
  ## Abstract
42
  Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient computation for reward scoring, which is computationally expensive, thus restricting optimization to only a few diffusion steps; (2) they often need continuous offline adaptation of reward models in order to achieve desired aesthetic quality, such as photorealism or precise lighting effects. To address the limitation of multistep denoising, we propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via interpolation, leveraging the equation that diffusion states are interpolations between noise and target images, which effectively avoids over-optimization in late timesteps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals. This approach enables online adjustment of rewards in response to positive and negative prompt augmentation, thereby reducing the reliance on offline reward fine-tuning. By fine-tuning the FLUX.1.dev model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.
43
-
44
- ## Quick Started
45
  ### Checkpoints
46
  The `diffusion_pytorch_model.safetensors` is online version of SRPO based on [FLUX.1 Dev](https://huggingface.co/black-forest-labs/FLUX.1-dev), trained on HPD dataset with [HPSv2](https://github.com/tgxs002/HPSv2)
 
 
 
 
 
 
 
47
 
48
- #### Inference
49
- Replace the `diffusion_pytorch_model.safetensors` of FLUX
50
- ```python
 
 
 
51
  from diffusers import FluxPipeline
 
 
52
  prompt='The Death of Ophelia by John Everett Millais, Pre-Raphaelite painting, Ophelia floating in a river surrounded by flowers, detailed natural elements, melancholic and tragic atmosphere'
53
  pipe = FluxPipeline.from_pretrained('./data/flux',
54
  torch_dtype=torch.bfloat16,
@@ -61,7 +71,7 @@ image = pipe(
61
  guidance_scale=3.5,
62
  height=1024,
63
  width=1024,
64
- num_inference_steps=infer_step,
65
  max_sequence_length=512,
66
  generator=generator
67
  ).images[0]
 
40
 
41
  ## Abstract
42
  Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient computation for reward scoring, which is computationally expensive, thus restricting optimization to only a few diffusion steps; (2) they often need continuous offline adaptation of reward models in order to achieve desired aesthetic quality, such as photorealism or precise lighting effects. To address the limitation of multistep denoising, we propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via interpolation, leveraging the equation that diffusion states are interpolations between noise and target images, which effectively avoids over-optimization in late timesteps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals. This approach enables online adjustment of rewards in response to positive and negative prompt augmentation, thereby reducing the reliance on offline reward fine-tuning. By fine-tuning the FLUX.1.dev model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.
 
 
43
  ### Checkpoints
44
  The `diffusion_pytorch_model.safetensors` is online version of SRPO based on [FLUX.1 Dev](https://huggingface.co/black-forest-labs/FLUX.1-dev), trained on HPD dataset with [HPSv2](https://github.com/tgxs002/HPSv2)
45
+ ## 🔑 Inference
46
+
47
+ ### Using ComfyUI
48
+
49
+ You can use it in [ComfyUI](https://github.com/comfyanonymous/ComfyUI).
50
+
51
+ Load the following image in ComfyUI to get the workflow, or load the JSON file directly [SRPO-workflow](comfyui/SRPO-workflow.json):
52
 
53
+ Tip: The workflow JSON info was added to the image file.
54
+
55
+ ![Example](comfyui/SRPO-workflow.png)
56
+
57
+ ### Quick start
58
+ ```bash
59
  from diffusers import FluxPipeline
60
+ from safetensors.torch import load_file
61
+
62
  prompt='The Death of Ophelia by John Everett Millais, Pre-Raphaelite painting, Ophelia floating in a river surrounded by flowers, detailed natural elements, melancholic and tragic atmosphere'
63
  pipe = FluxPipeline.from_pretrained('./data/flux',
64
  torch_dtype=torch.bfloat16,
 
71
  guidance_scale=3.5,
72
  height=1024,
73
  width=1024,
74
+ num_inference_steps=50,
75
  max_sequence_length=512,
76
  generator=generator
77
  ).images[0]