--- license: other license_name: flux-1-dev-non-commercial-license license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md language: - en library_name: diffusers pipeline_tag: text-to-image tags: - Text-to-Image - ControlNet - Diffusers - Flux.1-dev - image-generation - Stable Diffusion base_model: black-forest-labs/FLUX.1-dev --- ## RepText We present RepText, which aims to empower pre-trained monolingual text-to-image generation models with the ability to accurately render, or more precisely, replicate, multilingual visual text in user-specified fonts, without the need to really understand them. Specifically, we adopt the setting from ControlNet and additionally integrate language agnostic glyph and position of rendered text to enable generating harmonized visual text, allowing users to customize text content, font and position on their needs. To improve accuracy, a text perceptual loss is employed along with the diffusion loss. Furthermore, to stabilize rendering process, at the inference phase, we directly initialize with noisy glyph latent instead of random initialization, and adopt region masks to restrict the feature injection to only the text region to avoid distortion of the background. We conducted extensive experiments to verify the effectiveness of our RepText relative to existing works, our approach outperforms existing open-source methods and achieves comparable results to native multi-language closed-source models.
## ⭐ Update - [2025/06/07] [Model Weights](https://huggingface.co/Shakker-Labs/RepText) and [inference code](https://github.com/Shakker-Labs/RepText) released! - [2025/04/28] [Technical Report](https://arxiv.org/abs/2504.19724) released! ## Usage Please refer to [Github](https://github.com/Shakker-Labs/RepText) for details. ```python import torch from controlnet_flux import FluxControlNetModel from pipeline_flux_controlnet import FluxControlNetPipeline from PIL import Image, ImageDraw, ImageFont import numpy as np import cv2 import re import os def contains_chinese(text): if re.search(r'[\u4e00-\u9fff]', text): return True return False def canny(img): low_threshold = 50 high_threshold = 100 img = cv2.Canny(img, low_threshold, high_threshold) img = img[:, :, None] img = 255 - np.concatenate([img, img, img], axis=2) return img base_model = "black-forest-labs/FLUX.1-dev" controlnet_model = "Shakker-Labs/RepText" controlnet = FluxControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16) pipe = FluxControlNetPipeline.from_pretrained( base_model, controlnet=controlnet, torch_dtype=torch.bfloat16 ).to("cuda") ## set resolution width, height = 1024, 1024 ## set font font_path = "./assets/Arial_Unicode.ttf" # use your own font font_size = 80 # it is recommended to use a font size >= 60 font = ImageFont.truetype(font_path, font_size) ## set text content, position, color text_list = ["哩布哩布"] text_position_list = [(370, 200)] text_color_list = [(255, 255, 255)] ## set controlnet conditions control_image_list = [] # canny list control_position_list = [] # position list control_mask_list = [] # regional mask list control_glyph_all = np.zeros([height, width, 3], dtype=np.uint8) # all glyphs ## handle each line of text for text, text_position, text_color in zip(text_list, text_position_list, text_color_list): ### glyph image, render text to black background control_image_glyph = Image.new("RGB", (width, height), (0, 0, 0)) draw = ImageDraw.Draw(control_image_glyph) draw.text(text_position, text, font=font, fill=text_color) ### get bbox bbox = draw.textbbox(text_position, text, font=font) ### position condition control_position = np.zeros([height, width], dtype=np.uint8) control_position[bbox[1]:bbox[3], bbox[0]:bbox[2]] = 255 control_position = Image.fromarray(control_position.astype(np.uint8)) control_position_list.append(control_position) ### regional mask control_mask_np = np.zeros([height, width], dtype=np.uint8) control_mask_np[bbox[1]-5:bbox[3]+5, bbox[0]-5:bbox[2]+5] = 255 control_mask = Image.fromarray(control_mask_np.astype(np.uint8)) control_mask_list.append(control_mask) ### accumulate glyph control_glyph = np.array(control_image_glyph) control_glyph_all += control_glyph ### canny condition control_image = canny(cv2.cvtColor(np.array(control_image_glyph), cv2.COLOR_RGB2BGR)) control_image = Image.fromarray(cv2.cvtColor(control_image, cv2.COLOR_BGR2RGB)) control_image_list.append(control_image) control_glyph_all = Image.fromarray(control_glyph_all.astype(np.uint8)) control_glyph_all = control_glyph_all.convert("RGB") # control_glyph_all.save("./results/control_glyph.jpg") # it is recommended to use words such 'sign', 'billboard', 'banner' in your prompt # for Englith text, it helps if you add the text to the prompt prompt = "a street sign in city" for text in text_list: if not contains_chinese(text): prompt += f", '{text}'" prompt += ", filmfotos, film grain, reversal film photography" # optional print(prompt) generator = torch.Generator(device="cuda").manual_seed(42) image = pipe( prompt, control_image=control_image_list, # canny control_position=control_position_list, # position control_mask=control_mask_list, # regional mask control_glyph=control_glyph_all, # as init latent, optional, set to None if not used controlnet_conditioning_scale=1.0, controlnet_conditioning_step=30, width=width, height=height, num_inference_steps=30, guidance_scale=3.5, generator=generator, ).images[0] if not os.path.exists("./results"): os.makedirs("./results") image.save(f"./results/result.jpg") ``` ## 📑 Citation If you find RepText useful for your research and applications, please cite us using this BibTeX: ```bibtex @article{wang2025reptext, title={RepText: Rendering Visual Text via Replicating}, author={Wang, Haofan and Xu, Yujia and Li, Yimeng and Li, Junchen and Zhang, Chaowei and Wang, Jing and Yang, Kejia and Chen, Zhibo}, journal={arXiv preprint arXiv:2504.19724}, year={2025} } ``` ## 📧 Contact If you have any questions, please feel free to reach us at `haofanwang.ai@gmail.com`.