RealGRPO FLUX DiT Weights

This repository provides DiT weights fine-tuned from FLUX.1-dev with GRPO using the RealGRPO strategy.

RealGRPO targets a common post-training issue in image generation: reward hacking (e.g., over-smoothing, over-saturation, and synthetic-looking artifacts).
Compared with vanilla FLUX and standard GRPO baselines, these weights are optimized to better preserve prompt intent while reducing reward-driven artifacts.

What Is Included

Fine-tuned FLUX DiT weights (GRPO post-training).
Training objective based on contrastive positive/negative style guidance.
Compatibility with the RealGRPO codebase inference scripts.

Method (Brief)

RealGRPO uses a LLM to generate prompt-specific style pairs:

positive style cues (pos_style)
negative style cues (neg_style)

The reward encourages similarity to positive cues while penalizing negative cues, helping the model avoid artifact-prone shortcuts during alignment.

Note: This release contains DiT alignment weights, not a standalone full pipeline package. You need download black-forest-labs/FLUX.1-dev and replace the contents of the transfermer directory with the contents of this repository.

Downloads last month: 5

Model tree for YangZhou24/RealGRPO

Base model

black-forest-labs/FLUX.1-dev

Finetuned

(566)

this model