FLUX.1-schnell-mflux-v0.6.2-8bit

Hugging Face

comparison_output

A 8-bit quantized version of the FLUX.1-schnell text-to-image model from Black Forest Labs, implemented using the mflux (version 0.6.2) quantization approach.

Overview

This repository contains a 8-bit quantized version of the FLUX.1-schnell model, which significantly reduces the memory footprint while maintaining most of the generation quality. The quantization was performed using the mflux methodology (v0.6.2).

Original Model

FLUX.1-schnell is a lightweight text-to-image diffusion model developed by Black Forest Labs. It's designed to be faster and more efficient than many larger models while still producing high-quality images.

Benefits of 8-bit Quantization

  • Reduced Memory Usage: ~50% reduction in memory requirements compared to the original model
  • Faster Loading Times: Smaller model size means quicker initialization
  • Lower Storage Requirements: Significantly smaller disk footprint
  • Accessibility: Can run on consumer hardware with limited VRAM
  • Minimal Quality Loss: Maintains nearly identical output quality to the original model

Model Structure

This repository contains the following components:

  • text_encoder/: CLIP text encoder (8-bit quantized)
  • text_encoder_2/: Secondary text encoder (8-bit quantized)
  • tokenizer/: CLIP tokenizer configuration and vocabulary
  • tokenizer_2/: Secondary tokenizer configuration
  • transformer/: Main diffusion model components (8-bit quantized)
  • vae/: Variational autoencoder for image encoding/decoding (8-bit quantized)

Usage

Requirements

  • Python
  • PyTorch
  • Transformers
  • Diffusers
  • mflux library (for 8-bit model support)

Installation

pip install torch diffusers transformers accelerate
uv tool install mflux # check mflux README for more details

Example Usage

# export path for mflux
% mflux-generate \        
    --path "dhairyashil/FLUX.1-schnell-mflux-v0.6.2-8bit" \        
    --model schnell \                                                                           
    --steps 2 \                                                                                 
    --seed 2 \       
    --height 1920 \
    --width 1024 \
    --prompt "hot chocolate dish"

Comparison Output

The images generated from above prompt for different models are shown at the top.

fp16 and 8-bit results look visibly almost the same, with the 8-bit version maintaining excellent quality while using significantly less memory.

4-bit model is also available for comparison, though with more noticeable quality reduction.

Performance Comparison

Model Version Memory Usage Inference Speed Quality
Original FP16 ~36 GB Base Base
8-bit Quantized ~18 GB Nearly identical Nearly identical
4-bit Quantized ~9 GB Slightly slower Moderately reduced

Other Highlights

  • Very minimal quality degradation compared to the original model
  • Nearly identical inference speed
  • Rare artifacts that are generally imperceptible in most use cases

Acknowledgements

  • Black Forest Labs for creating the original FLUX.1-schnell model
  • Filip Strand for developing the mflux quantization methodology
  • The Hugging Face team for their Diffusers and Transformers libraries

License

This model inherits the license of the original FLUX.1-schnell model. Please refer to the original model repository for licensing information.

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support