Furkan Gözükara

MonsterMMORPG

AI & ML interests

Check out my youtube page SECourses for Stable Diffusion tutorials. They will help you tremendously in every topic

Articles

Organizations

Posts 36

view post
Post
585
I have done an extensive multi-GPU FLUX Full Fine Tuning / DreamBooth training experimentation on RunPod by using 2x A100–80 GB GPUs (PCIe) since this was commonly asked of me.

Full article here : https://medium.com/@furkangozukara/multi-gpu-flux-fu

Image 1
Image 1 shows that only first part of installation of Kohya GUI took 30 minutes on a such powerful machine on a very expensive Secure Cloud pod — 3.28 USD per hour
There was also part 2, so just installation took super time
On Massed Compute, it would take like 2–3 minutes
This is why I suggest you to use Massed Compute over RunPod, RunPod machines have terrible hard disk speeds and they are like lottery to get good ones



Image 2, 3 and 4
Image 2 shows speed of our very best config FLUX Fine Tuning training shared below when doing 2x Multi GPU training
https://www.patreon.com/posts/kohya-flux-fine-112099700
Used config name is : Quality_1_27500MB_6_26_Second_IT.json
Image 3 shows VRAM usage of this config when doing 2x Multi GPU training
Image 4 shows the GPUs of the Pod


Image 5 and 6
Image 5 shows speed of our very best config FLUX Fine Tuning training shared below when doing a single GPU training
https://www.patreon.com/posts/kohya-flux-fine-112099700
Used config name is : Quality_1_27500MB_6_26_Second_IT.json
Image 6 shows this setup used VRAM amount


Image 7 and 8
Image 7 shows speed of our very best config FLUX Fine Tuning training shared below when doing a single GPU training and Gradient Checkpointing is disabled
https://www.patreon.com/posts/kohya-flux-fine-112099700
Used config name is : Quality_1_27500MB_6_26_Second_IT.json
Image 8 shows this setup used VRAM amount


....
view post
Post
826
Single Block / Layer FLUX LoRA Training Research Results and LoRA Network Alpha Change Impact With LoRA Network Rank Dimension

Full article posted here : https://medium.com/@furkangozukara/single-block-layer-flux-lora-training-research-results-and-lora-network-alpha-change-impact-with-e713cc89c567

Conclusions
As expected, as you train lesse parameters e.g. LoRA vs Full Fine Tuning or Single Blocks LoRA vs all Blocks LoRA, your quality get reduced
Of course you earn some extra VRAM memory reduction and also some reduced size on the disk
Moreover, lesser parameters reduces the overfitting and realism of the FLUX model, so if you are into stylized outputs like comic, it may work better
Furthermore, when you reduce LoRA Network Rank, keep original Network Alpha unless you are going to do a new Learning Rate research
Finally, very best and least overfitting is achieved with full Fine Tuning
Check figure 3 and figure 4 last columns — I make extracted LoRA Strength / Weight 1.1 instead of 1.0
Full fine tuning configs and instructions > https://www.patreon.com/posts/112099700
Second best one is extracting a LoRA from Fine Tuned model if you need a LoRA
Check figure 3 and figure 4 last columns — I make extracted LoRA Strength / Weight 1.1 instead of 1.0
Extract LoRA guide (public article) : https://www.patreon.com/posts/112335162
Third is doing a all layers regular LoRA training
Full guide, configs and instructions > https://www.patreon.com/posts/110879657
And the worst quality is training lesser blocks / layers with LoRA
Full configs are included in > https://www.patreon.com/posts/110879657
So how much VRAM and Speed single block LoRA training brings?
All layers 16 bit is 27700 MB (4.85 second / it) and 1 single block is 25800 MB (3.7 second / it)
All layers 8 bit is 17250 MB (4.85 second / it) and 1 single block is 15700 MB (3.8 second / it)
Image Raw Links
Figure 0 : MonsterMMORPG/FLUX-Fine-Tuning-Grid-Tests