meimeilook/BAGEL-7B-MoT-FP8

Original model is https://huggingface.co/ByteDance-Seed/BAGEL-7B-MoT

ema-FP8.safetensors is float8_e4m3fn.

float8_e4m3fn weight of: https://huggingface.co/ByteDance-Seed/BAGEL-7B-MoT

Benchmark Spec: 24GB 4090 + 60GB RAM

Default setting, Timesteps 25 steps

Features	Speed (seconds)	GPU VRAM Usage	CPU RAM Usage
📝 Text to Image	128.90 s	16.18 GB	14.22 GB
🖌️ Image Edit	138.67 s	15.08 GB	14.21 GB
🖼️ Image Understanding	102.68 s	15.08 GB	13.66 GB

Benchmark Images

Support

Runs with less than 12GB of GPU memory.

ram + vram = about 31GB

* 12GB is much slower than 24GB due to CPU offload. It will be 1.5x much slower than 24GB

How to Install：

new venv

git clone https://github.com/bytedance-seed/BAGEL.git
cd BAGEL
conda create -n bagel python=3.10 -y
conda activate bagel

install

install pytorch 2.5.1
CUDA 12.4
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu124
pip install flash_attn-2.7.0.post1+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
more whl: https://github.com/Dao-AILab/flash-attention/releases
It needs to be the same as the Python version, PyTorch version, CUDA version, and flash_attn WHL.
pip install -r requirements.txt
(edit requirements.txt, without flash_attn==2.5.8, make it #flash_attn==2.5.8)
pip install gradio pynvml (#pynvml for check vram stats.)

Models & Settings:

Download huggingface.co/ByteDance-Seed/BAGEL-7B-MoT(without ema.safetensors) & ema-FP8.safetensors and make it like this.

folders
├── BAGEL
│   └── app-fp8.py
└── BAGEL-7B-MoT
    └── ema-FP8.safetensors

Open app-fp8.py via Notepad or VScode etc.
Replace model_path to yours.

parser.add_argument("--model_path", type=str, default="/root/your_path/BAGEL-7B-MoT")

Edit your spec:

cpu_mem_for_offload = "16GiB"
gpu_mem_per_device = "24GiB"  #default：24GiB  you can set 16GB within 24GB with 4090,more slower.

Be more efficient

NUM_ADDITIONAL_LLM_LAYERS_TO_GPU = 5  
# (5 for 24gb VRAM, >5 for 32gb VRAM, have a try)
# The default is 10 layers in GPU, use it can be 15 layers in GPU with 4090.

How to Use:

CD BAGEL
conda activate bagel
python app-fp8.py
Open 127.0.0.1:7860

meimeilook
/

BAGEL-7B-MoT-FP8

Benchmark Spec: 24GB 4090 + 60GB RAM

Default setting, Timesteps 25 steps

Support

Runs with less than 12GB of GPU memory.

ram + vram = about 31GB

* 12GB is much slower than 24GB due to CPU offload. It will be 1.5x much slower than 24GB

How to Install：

new venv

install

Models & Settings:

How to Use:

Model tree for meimeilook/BAGEL-7B-MoT-FP8