🎨 Imagine:
Words To Visuals

🌟 What is Imagine?

Imagine is an all-in-one framework for creating visually stunning posters, blending:

Precise and accurate text rendering
Seamless integration of abstract art
Bold, eye-catching layouts
A cohesive and harmonious visual style

🚀 Quick Start

🔧 Installation

# Clone the repository
git clone https://github.com/skylinemusiccds/Imagine.git
cd Imagine

# Create conda environment
conda create -n imagine python=3.11
conda activate imagine

# Install dependencies
pip install -r requirements.txt

🚀 Easy Usage

Imagine offers a modular and adaptable framework that seamlessly fits into custom workflows or interoperates with other compatible systems. Its design prioritizes ease of use and flexibility, making integration effortless.

Loading the model is quick and intuitive:

import torch
from diffusers import FluxPipeline, FluxTransformer2DModel

# 1. Define model IDs and settings
pipeline_id = "black-forest-labs/FLUX.1-dev"
imagine_transformer_id = "Satyam-Singh/Imagine" 
device = "cuda"
dtype = torch.bfloat16

# 2. Load the base pipeline
pipe = FluxPipeline.from_pretrained(pipeline_id, torch_dtype=dtype)

# 3. The key step: simply replace the original transformer with our Imagine model
pipe.transformer = FluxTransformer2DModel.from_pretrained(
    imagine_transformer_id, 
    torch_dtype=dtype
)
pipe.to(device)

# Now, `pipe` is a standard diffusers pipeline ready for inference with your own logic.

🚀 Quick Generation

For the best results, we recommend using the provided inference.py script, which includes our intelligent prompt rewriting feature. This enhancement automatically refines your input to generate more compelling and visually stunning results.

Generate Posters with Precision

Create high-quality aesthetic posters from your prompt using BF16 precision for improved performance and efficiency.

👉 Get started by visiting our GitHub repository.

python inference.py \
  --prompt "Urban Canvas Street Art Expo poster with bold graffiti lettering and vibrant, dynamic color splashes capturing the energy of street art." \
  --enable_recap \
  --num_inference_steps 28 \
  --guidance_scale 3.5 \
  --seed 42 \
  --pipeline_path "black-forest-labs/FLUX.1-dev" \
  --custom_transformer_path "Satyam-Singh/Imagine" \
  --qwen_model_path "Qwen/Qwen3-8B"

If you are running on a GPU with limited memory, you can use inference_offload.py to offload some components to the CPU:

python inference_offload.py \
  --prompt "Urban Canvas Street Art Expo poster with bold graffiti lettering and vibrant, dynamic color splashes capturing the energy of street art." \
  --enable_recap \
  --num_inference_steps 28 \
  --guidance_scale 3.5 \
  --seed 42 \
  --pipeline_path "black-forest-labs/FLUX.1-dev" \
  --custom_transformer_path "Satyam-Singh/Imagine" \
  --qwen_model_path "Qwen/Qwen3-8B"

💻 Gradio Web UI

We provide a Gradio web UI for Imagine, please refer to our GitHub repository.

python demo_gradio.py

📊 Performance Benchmarks

📈 Quantitative Results

Method	Text Recall ↑	Text F-score ↑	Text Accuracy ↑
OpenCOLE (Open)	0.082	0.076	0.061
Playground-v2.5 (Open)	0.157	0.146	0.132
SD3.5 (Open)	0.565	0.542	0.497
Flux1.dev (Open)	0.723	0.707	0.667
Ideogram-v2 (Close)	0.711	0.685	0.680
BAGEL (Open)	0.543	0.536	0.463
Gemini2.0-Flash-Gen (Close)	0.798	0.786	0.746
Imagine (ours)	0.787	0.774	0.735

📝 Citation

If you find Imagine useful for your research, please cite our paper:

@article{LLaVA : !magine,
  title={LLaVA Imagine: Words to Visuals},
  author={Satyam Singh, UniVerse Ai},
  year={2025}
}

🎨 Imagine:Words To Visuals