FLUX.1-schnell-mflux-v0.6.2-8bit
A 8-bit quantized version of the FLUX.1-schnell text-to-image model from Black Forest Labs, implemented using the mflux (version 0.6.2) quantization approach.
Overview
This repository contains a 8-bit quantized version of the FLUX.1-schnell model, which significantly reduces the memory footprint while maintaining most of the generation quality. The quantization was performed using the mflux methodology (v0.6.2).
Original Model
FLUX.1-schnell is a lightweight text-to-image diffusion model developed by Black Forest Labs. It's designed to be faster and more efficient than many larger models while still producing high-quality images.
Benefits of 8-bit Quantization
- Reduced Memory Usage: ~50% reduction in memory requirements compared to the original model
- Faster Loading Times: Smaller model size means quicker initialization
- Lower Storage Requirements: Significantly smaller disk footprint
- Accessibility: Can run on consumer hardware with limited VRAM
- Minimal Quality Loss: Maintains nearly identical output quality to the original model
Model Structure
This repository contains the following components:
text_encoder/
: CLIP text encoder (8-bit quantized)text_encoder_2/
: Secondary text encoder (8-bit quantized)tokenizer/
: CLIP tokenizer configuration and vocabularytokenizer_2/
: Secondary tokenizer configurationtransformer/
: Main diffusion model components (8-bit quantized)vae/
: Variational autoencoder for image encoding/decoding (8-bit quantized)
Usage
Requirements
- Python
- PyTorch
- Transformers
- Diffusers
- mflux library (for 8-bit model support)
Installation
pip install torch diffusers transformers accelerate
uv tool install mflux # check mflux README for more details
Example Usage
# export path for mflux
% mflux-generate \
--path "dhairyashil/FLUX.1-schnell-mflux-v0.6.2-8bit" \
--model schnell \
--steps 2 \
--seed 2 \
--height 1920 \
--width 1024 \
--prompt "hot chocolate dish"
Comparison Output
The images generated from above prompt for different models are shown at the top.
fp16 and 8-bit results look visibly almost the same, with the 8-bit version maintaining excellent quality while using significantly less memory.
4-bit model is also available for comparison, though with more noticeable quality reduction.
Performance Comparison
Model Version | Memory Usage | Inference Speed | Quality |
---|---|---|---|
Original FP16 | ~36 GB | Base | Base |
8-bit Quantized | ~18 GB | Nearly identical | Nearly identical |
4-bit Quantized | ~9 GB | Slightly slower | Moderately reduced |
Other Highlights
- Very minimal quality degradation compared to the original model
- Nearly identical inference speed
- Rare artifacts that are generally imperceptible in most use cases
Acknowledgements
- Black Forest Labs for creating the original FLUX.1-schnell model
- Filip Strand for developing the mflux quantization methodology
- The Hugging Face team for their Diffusers and Transformers libraries
License
This model inherits the license of the original FLUX.1-schnell model. Please refer to the original model repository for licensing information.
- Downloads last month
- 6