tencent
/

SRPO

Text-to-Image

Diffusers

Safetensors

Model card Files Files and versions

xet

Community

Zhiminli commited on 10 days ago

Commit

6f3a64c

verified ·

1 Parent(s): 008e054

Update README.md

Browse files

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -40,6 +40,20 @@ pipeline_tag: text-to-image
 ## Abstract
 Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient computation for reward scoring, which is computationally expensive, thus restricting optimization to only a few diffusion steps; (2) they often need continuous offline adaptation of reward models in order to achieve desired aesthetic quality, such as photorealism or precise lighting effects. To address the limitation of multistep denoising, we propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via interpolation, leveraging the equation that diffusion states are interpolations between noise and target images, which effectively avoids over-optimization in late timesteps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals. This approach enables online adjustment of rewards in response to positive and negative prompt augmentation, thereby reducing the reliance on offline reward fine-tuning. By fine-tuning the FLUX.1.dev model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.
 ### Checkpoints
 The `diffusion_pytorch_model.safetensors` is online version of SRPO based on [FLUX.1 Dev](https://huggingface.co/black-forest-labs/FLUX.1-dev), trained on HPD dataset with [HPSv2](https://github.com/tgxs002/HPSv2)
 ## 🔑 Inference

 ## Abstract
 Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient computation for reward scoring, which is computationally expensive, thus restricting optimization to only a few diffusion steps; (2) they often need continuous offline adaptation of reward models in order to achieve desired aesthetic quality, such as photorealism or precise lighting effects. To address the limitation of multistep denoising, we propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via interpolation, leveraging the equation that diffusion states are interpolations between noise and target images, which effectively avoids over-optimization in late timesteps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals. This approach enables online adjustment of rewards in response to positive and negative prompt augmentation, thereby reducing the reliance on offline reward fine-tuning. By fine-tuning the FLUX.1.dev model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.
+## Acknowledgement
+We sincerely appreciate contributions from the research community to this project. Below are quantized versions developed by fellow researchers.
+1. 8bit(fp8_e4m3fn/Q8_0) version by wikeeyang: https://huggingface.co/wikeeyang/SRPO-Refine-Quantized-v1.0
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6645835a2b57c619a19cc0c4/BATJ0bW_0QPhkN5WY0Q1H.png)
+2. bf16 version by rockerBOO: https://huggingface.co/rockerBOO/flux.1-dev-SRPO
+⚠️ Note: Note: When loading weights in ComfyUI, avoid direct conversion of FP32 weights to FP8 format, as this may result in incomplete denoising. For official weights in this repository, FP32/BF16 loading is recommended.
 ### Checkpoints
 The `diffusion_pytorch_model.safetensors` is online version of SRPO based on [FLUX.1 Dev](https://huggingface.co/black-forest-labs/FLUX.1-dev), trained on HPD dataset with [HPSv2](https://github.com/tgxs002/HPSv2)
 ## 🔑 Inference