Forge VRAM Offload Optimizer

Working with Flux models on Forge with low VRAM GPUs inevitably leads to out of memory crashes when changing models frequently as residual data does not get fully cleared, eventually leading to VRAM saturation. The traditional fix has been to add the --always-offload-from-vram launch argument. This results in unloading after every render, making larger models like Flux tedious to work with as unloading/reloading takes up significant time.

What FVOO does:

Optimizes VRAM usage in Forge by eliminating delays from --always-offload-from-vram between same-model renders by only clearing VRAM when a new model is selected in WebUI. Out of memory crashes are virtually eliminated and time between renders, especially with Flux models is greatly decreased.

Installation

Place custom_offload.py in <forge_root>\scripts\.
Remove --always-offload-from-vram from launch args
Run Forge!

Note: not tested on reForge or A1111 but likely works on those platforms as well

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support