Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AbstractPhil 
posted an update 3 days ago
Post
283
My indev Surge training methodology and paradigm is powerful. The preliminary tests will be available for debugging soon using a customized sd-scripts and a series of full finetunes using sdxl as a catalyst to the training paradigm.
https://civitai.com/articles/14195/the-methodology-of-surge-training-loss-math
The datasets I'm sourcing are going to be catalysts and tests for the power of Surge to teach very sticky or difficult to understand elements; such as text, positioning, offset, controlnet poses, and more directly into the very stubborn SDXL infrastructure without additional tools.
Should be noted that my current running finetunes based on BeatriXL are not Surge trained - so you won't gain knowledge on Surge from them.

GPT and I have prototyped a new version of SD15 that operates on additional attention heads to match the Surge formula, the Omega-VIT-L reformed, a zeroed unet, and the Flux 16 channel AE.
I'll call it SD-SURGE - as it's not sd15 anymore.
The first surge trainings are already under way.

Simply put - I'm not that good with sd15. I got a few cool things working, but training a zero'd model isn't my forte. I'm better at enhancing or improving rather than creating entire full structures. My limit here is almost strictly a logistics one. The libraries don't like sd15 and they especially don't like when I start tinkering with internals and recording information from layer activation.
I require additional fundamental tests and foundational documentation before I proceed with my SD15 finetune from baseline to useful - as training tends to be quite vague to me still. Training Sd15 from zero to a useful state, requires additional tests, practices, and experience to train full models from zero to an anchorable enough of a state for Surge to latch onto the necessary points of interest. Those points of interest won't heat up without the necessary level of attachment and consistency within the inference - so they are required to create the powerful web of information required to learn and advance models from point A to B - without simply replacing the neurons themselves and making a flat copy.
There are other potentials here as well, where implanting neurons and finetuning has shown very good results. However, this isn't about that. This is about showcasing the power of the baseline system.

Hence, the other spectrum - where the model already exists, and has some fairly tested power for, I can interpolate two exacting classifier models - teacher/student using direct layer to layer learning and optimizations for time and speed. Classifiers can be small, lightweight, and fully capable of anchorpoint information transfer as well.
So instead of a full 1.5 finetune with various extra improvements - which the plan was to allow interpolation from a much stronger model into a much less robust model; I plan to take a classifier that is fully trained, and remodel one that has never seen the data and attempt to interpolate train this unfinished - imperfect - classifier; in order to introduce the necessary information using surge and anchorpoints into it's already trained using different data neurons, this process must be repeatable and useful for other realms other than simply 1:1 which is the point of an adapter like surge.

I'd say a fair full repeatable notebook is in order using simple processes and a simple classifier layer set. I'll choose well known models that have omitted training intentionally, and introduce that training using another model that is trained in the notebook itself. No tricks, no barriers, nothing special. Just keras, a bit of code, a bit of research, and a bit of elbow grease using the process.
This will showcase the power of Surge while simultaneously introducing a new type of rapid interpolative learning.

In this post