FLUX.1-schnell-mflux-v0.6.2-4bit

Hugging Face

comparison_output

A 4-bit quantized version of the FLUX.1-schnell text-to-image model from Black Forest Labs, implemented using the mflux (version 0.6.2) quantization approach.

Overview

This repository contains a 4-bit quantized version of the FLUX.1-schnell model, which significantly reduces the memory footprint while maintaining most of the generation quality. The quantization was performed using the mflux methodology (v0.6.2).

Original Model

FLUX.1-schnell is a lightweight text-to-image diffusion model developed by Black Forest Labs. It's designed to be faster and more efficient than many larger models while still producing high-quality images.

Benefits of 4-bit Quantization

  • Reduced Memory Usage: ~85% reduction in memory requirements compared to the original model
  • Faster Loading Times: Smaller model size means quicker initialization
  • Lower Storage Requirements: Significantly smaller disk footprint
  • Accessibility: Can run on consumer hardware with limited VRAM

Model Structure

This repository contains the following components:

  • text_encoder/: CLIP text encoder (4-bit quantized)
  • text_encoder_2/: Secondary text encoder (4-bit quantized)
  • tokenizer/: CLIP tokenizer configuration and vocabulary
  • tokenizer_2/: Secondary tokenizer configuration
  • transformer/: Main diffusion model components (4-bit quantized)
  • vae/: Variational autoencoder for image encoding/decoding (4-bit quantized)

Usage

Requirements

  • Python
  • PyTorch
  • Transformers
  • Diffusers
  • mflux library (for 4-bit model support)

Installation

pip install torch diffusers transformers accelerate
uv tool install mflux # check mflux README for more details

Example Usage

# export path for mflux
% mflux-generate \        
    --path "dhairyashil/FLUX.1-schnell-mflux-v0.6.2-4bit" \        
    --model schnell \                                                                           
    --steps 2 \                                                                                 
    --seed 2 \       
    --height 1920 \
    --width 1024 \
    --prompt "hot chocolate dish"

Comparison Output

The images generated from above prompt for different models are shown at the top.

fp16 and 8-bit results are visibly look almost the same but 4-bit result looks a little deviated.

8-bit model is available for comparison.

Performance Comparison

Model Version Memory Usage Inference Speed Quality
Original FP16 ~57 GB Base Base
4-bit Quantized ~9 GB Slightly slower Slightly reduced

Limitations

  • Minor quality degradation compared to the original model
  • Slightly slower inference speed
  • May exhibit occasional artifacts not present in the original model

Acknowledgements

  • Black Forest Labs for creating the original FLUX.1-schnell model
  • Filip Strand for developing the mflux quantization methodology
  • The Hugging Face team for their Diffusers and Transformers libraries

License

This model inherits the license of the original FLUX.1-schnell model. Please refer to the original model repository for licensing information.

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support