Forge VRAM Offload Optimizer
Working with Flux models on Forge with low VRAM GPUs inevitably leads to out of memory crashes when changing models frequently as residual data does not get fully cleared, eventually leading to VRAM saturation. The traditional fix has been to add the --always-offload-from-vram launch argument. This results in unloading after every render, making larger models like Flux tedious to work with as unloading/reloading takes up significant time.
What FVOO does:
Optimizes VRAM usage in Forge by eliminating delays from --always-offload-from-vram
between same-model renders by only clearing VRAM when a new model is selected in WebUI. Out of memory crashes are virtually eliminated and time between renders, especially with Flux models is greatly decreased.
Installation
- Place
custom_offload.py
in<forge_root>\scripts\
. - Remove
--always-offload-from-vram
from launch args - Run Forge!
Note: not tested on reForge or A1111 but likely works on those platforms as well