This content originally appeared on DEV Community and was authored by Furkan Gözükara
Info
- 
        
As you know I have finalized and perfected my FLUX Fine Tuning and LoRA training workflows until something new arrives
 - 
        
Both are exactly same, only we load LoRA config into LoRA tab of Kohya GUI and we load Fine Tuning config into Dreambooth tab
 - 
        
When we use Classification / Regularization images actually Fine Tuning becomes Dreambooth training as you know
 - 
        
However with FLUX, Classification / Regularization images do not help as I have shown previously with Grid experimentations
 - 
        
FLUX LoRA training configs and details : https://www.patreon.com/posts/110879657
- 
                
Full tutorial video : https://youtu.be/nySGu12Y05k
 - 
                
Full cloud tutorial video : https://youtu.be/-uhL2nW7Ddw
 
 - 
                
 - 
        
FLUX Fine Tuning configs and details : https://www.patreon.com/posts/112099700
 - 
        
So what is up with Single Block FLUX LoRA training?
 - 
        
FLUX model is composed of by 19 double blocks and 39 single blocks
 - 
        
1 double block takes around 640 MB VRAM and 1 single block around 320 MB VRAM in 16-bit precision when doing a Fine Tuning training
- 
                
We have configs for 16GB, 24GB and 48GB GPUs all same quality, only speed is different
 
 - 
                
 - 
        
Normally we train a LoRA on all of the blocks
 - 
        
However it was claimed that you can train a single block and still get good results
 - 
        
So I have researched this thoroughly and sharing all info in this article
 - 
        
Moreover, I decided to reduce LoRA Network Rank (Dimension) of my workflow and testing impact of keeping same Network Alpha or scaling it relatively
 
Experimentation Details and Hardware
- 
        
We are going to use Kohya GUI
 - 
        
How to install it and use and train full tutorial here : https://youtu.be/nySGu12Y05k
 - 
        
Full tutorial for Cloud services here : https://youtu.be/-uhL2nW7Ddw
 - 
        
I have used my classical 15 images experimentation dataset
 - 
        
I have trained 150 epochs thus 2250 steps
 - 
        
All experiments are done on a single RTX A6000 48 GB GPU (almost same speed as RTX 3090)
 - 
        
In all experiments I have trained Clip-L as well except in Fine Tuning (you can't train it yet)
 - 
        
I know it doesn't have expressions but that is not the point you can see my 256 images training results with exact same workflow here : https://www.reddit.com/r/StableDiffusion/comments/1ffwvpo/tried_expressions_with_flux_lora_training_with_my/
 - 
        
So I research a workflow and when you use a better dataset you get even better results
 - 
        
I will give full links to the Figures so click them to download and see full resolution
 - 
        
Figure 0 is first uploaded image and so on with numbers
 
Research of 1-Block Training
- 
        
I have used my exact same settings and trained 0-7 double blocks and 0-15 single blocks at first to determine whether block number matters a lot or not with same learning rate of my full layers LoRA training
 - 
        
0-7 double blocks results can be seen in Figure_0.jfif and 0-15 single block results can be seen in Figure_1.jfif
 - 
        
I didn't notice very meaningful difference and also the learning rate was too low as can be seen from the figures
 - 
        
But still I picked single block-8 as best one to expand the research
 - 
        
Then I have trained 8 different learning rates on single-block 8 and determined the best learning rate as shown in Figure_2.jfif
 - 
        
It required more than 10 times learning rate of all blocks regular FLUX LoRA training
 - 
        
Then I decided to test combination of different single blocks / layers and wanted to see their impact
 - 
        
As can be seen in Figure_3.jfif I have tried combination of 2-11 different layers
 - 
        
As the number of trained layers increased, obviously it required a new fine-tuned learning rate
 - 
        
Thus I decided to not move any further at the moment because single layer training will obviously yield sub-par results and i don't see much benefit of them
 - 
        
In all cases Full FLUX Fine Tuning > LoRA Extraction from Full FLUX Fine Tuned Model > LoRA full Layers training > reduced FLUX LoRA layers training
 
Research of Network Alpha Change
- 
        
In my very best FLUX LoRA training workflow I use LoRA Network Rank (Dimension) as 128
 - 
        
The impact of is, the generated LoRA file sizes are bigger
 - 
        
It keeps more information but also causes more overfitting
 - 
        
So with some tradeoffs, this LoRA Network Rank (Dimension) can be reduced
 - 
        
Normally I found my workflow with 128 Network Rank (Dimension) / 128 (Network Alpha)
 - 
        
The Network Alpha directly scales the Learning Rate thus changing it affects the Learning Rate
 - 
        
We also know that training more parameters requires lesser Learning Rate already by now from above experiments and from FLUX Full Fine Tuning experiments
 - 
        
So when we reduce LoRA Network Rank (Dimension) what should we do to not change Learning Rate?
 - 
        
Here comes the Network Alpha into play
 - 
        
Should we scale it or keep it as it is?
 - 
        
Thus I have experimented LoRA Network Rank (Dimension) 16 / 16 (Network Alpha) and 16 / 128
 - 
        
So in 1 experiment I kept it as it is and in another experiment I relatively scaled it
 - 
        
The results are shared in Figure_4.jpg
 
Conclusions
- 
        
As expected, as you train lesse parameters e.g. LoRA vs Full Fine Tuning or Single Blocks LoRA vs all Blocks LoRA, your quality get reduced
 - 
        
Of course you earn some extra VRAM memory reduction and also some reduced size on the disk
 - 
        
Moreover, lesser parameters reduces the overfitting and realism of the FLUX model, so if you are into stylized outputs like comic, it may work better
 - 
        
Furthermore, when you reduce LoRA Network Rank, keep original Network Alpha unless you are going to do a new Learning Rate research
 - 
        
Finally, very best and least overfitting is achieved with full Fine Tuning
- 
                
Check figure 3 and figure 4 last columns - I make extracted LoRA Strength / Weight 1.1 instead of 1.0
 - 
                
Full fine tuning configs and instructions > https://www.patreon.com/posts/112099700
 
 - 
                
 - 
        
Second best one is extracting a LoRA from Fine Tuned model if you need a LoRA
- 
                
Check figure 3 and figure 4 last columns - I make extracted LoRA Strength / Weight 1.1 instead of 1.0
 - 
                
Extract LoRA guide (public article) : https://www.patreon.com/posts/112335162
 
 - 
                
 - 
        
Third is doing a all layers regular LoRA training
- 
                
Full guide, configs and instructions > https://www.patreon.com/posts/110879657
 
 - 
                
 - 
        
And the worst quality is training lesser blocks / layers with LoRA
- 
                
Full configs are included in > https://www.patreon.com/posts/110879657
 
 - 
                
 - 
        
So how much VRAM and Speed single block LoRA training brings?
- 
                
All layers 16 bit is 27700 MB (4.85 second / it) and 1 single block is 25800 MB (3.7 second / it)
 - 
                
All layers 8 bit is 17250 MB (4.85 second / it) and 1 single block is 15700 MB (3.8 second / it)
 
 - 
                
 
Image Raw Links
- 
        
Figure 0 : https://huggingface.co/MonsterMMORPG/FLUX-Fine-Tuning-Grid-Tests/resolve/main/Figure_0.jfif
 - 
        
Figure 1 : https://huggingface.co/MonsterMMORPG/FLUX-Fine-Tuning-Grid-Tests/resolve/main/Figure_1.jfif
 - 
        
Figure 2 : https://huggingface.co/MonsterMMORPG/FLUX-Fine-Tuning-Grid-Tests/resolve/main/Figure_2.jfif
 - 
        
Figure 3 : https://huggingface.co/MonsterMMORPG/FLUX-Fine-Tuning-Grid-Tests/resolve/main/Figure_3.jfif
 - 
        
Figure 4 : https://huggingface.co/MonsterMMORPG/FLUX-Fine-Tuning-Grid-Tests/resolve/main/Figure_4.jpg
 
Figures
This content originally appeared on DEV Community and was authored by Furkan Gözükara
Furkan Gözükara | Sciencx (2024-09-20T02:18:43+00:00) Single Block / Layer FLUX LoRA Training Research Results and LoRA Network Alpha Change Impact With LoRA Network Rank Dimension. Retrieved from https://www.scien.cx/2024/09/20/single-block-layer-flux-lora-training-research-results-and-lora-network-alpha-change-impact-with-lora-network-rank-dimension/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.
		



