Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AbstractPhil 
posted an update May 18
Post
596
Forcefeeding masked T5-Small 1 billion human-association captions to fry it's brain. I really don't know how long it'll take until I start nor do I know the logistic challenges I'll face when moving data from A to B, but the outcome should completely fry it and make it only fixate on human and diffusion responses. Should be a fun experiment that can just kind of run on automation.
The experiment's captions are available... mostly on my hf, I've had some rate limit problems that caused them to halt and I think I need to autogen another 100 million complex captions.
This WILL form heavy bias and burn-points. Random words will be peppered in the mix to allow the T5-Small to retain at least some semblance of what it was before I lobotomize it.
Likely I'll completely freeze half and burn the other half for a couple million as a test point. See how it takes or if it dies before 50k or something and need a refined process.
Oh great, even better. It didn't include the longer prompt variations. This won't start today.

Alright training began. I'm introducing a high degree variant of noise and chatter for the t5 to learn to bypass - while simultaneously increasing additional information output from the t5 in the process.
So far the outcome has been a degree of introduction for new information in the output. while simultaneously introducing rule of 3 parameterization into the T5 small.
I have high hopes.

Try 2 seems to be normalizing around the intentional low loss valuation that I set up around the combination of scaled weights and careful use of alternating special tokens meant to teach models like the T5 how to behave intrinsically according to the paper and the research done on sentencepiece.
Slow bleed should help preserve a large combination of internal structure while slowly attenuating the structure and reshaping the weights of the high/low curve around each batch directly, creating a bit of a cascade 4d bleed effect from one middle-ground high-end topic swapped per one middle-ground low-end topic's valuation for learning and rephrasing.
In the process I've introduced inverse weighting to account for too many of one or too many of another token, while simultaneously improving the power of the lowest without overfitting everything to the lowest on a minimal scale; while simultaneously reducing the overall effect of the highest accountable token. This assists with overfitting everything based on generic flood linear, while allowing training on much smaller captions, without completely obliterating the entire structure of the T5-small in less than a few thousand steps.
Additionally, the highest, and lowest tokens are automatically weighted up or down; and once a token is masked, it automatically rescales the structure around the variant being masked and attended to.

This over time will teach how to predict variant caption middle-curve portions; allowing it to eventually conform to the normalization of the new "middle-ground" prediction purposed to swap elements from one caption to another. This should allow a very high learn rate, without a complete destruction of the T5-small; due to the various regularization techniques engineered for this task today.
This process will tap into similar training methods applied to the T5 small originally, as well as introducing newer methods designed and implemented on much larger scale AI training models - with weighted substructures meant to not rip the arms and legs off of this little model.

It won't do exactly what I wanted to do yet, but that's where the high complexity captions will come into play. They are based entirely on compartmentalized sub-sectioning the critical systems into usable ways - for example;
example prompt; a room of tacos
potential goal; a brightly lit room completely filled with many tacos

It works. I've devised a methodology for token injection using an adapter trained directly between the T5 and CLIP_L to guide the CLIP_L in useful ways.

image.png

·

This variation is based on taking only the t5 in at runtime, and the adapter attempting to approximate the guidance that the vit-l-14 would need by associative loss at training time.
This outcome wasn't robust enough and overall lacked the necessary detail that an adapter of this nature would require, so I recreated the adapter to be trained to accept both the t5-small and vit-l-14 as inputs. The outcomes are substantially more stable. The adapter weights and outcomes are posted as t5-vit-14-v1

In this post