Try 2 seems to be normalizing around the intentional low loss valuation that I set up around the combination of scaled weights and careful use of alternating special tokens meant to teach models like the T5 how to behave intrinsically according to the paper and the research done on sentencepiece.
Slow bleed should help preserve a large combination of internal structure while slowly attenuating the structure and reshaping the weights of the high/low curve around each batch directly, creating a bit of a cascade 4d bleed effect from one middle-ground high-end topic swapped per one middle-ground low-end topic's valuation for learning and rephrasing.
In the process I've introduced inverse weighting to account for too many of one or too many of another token, while simultaneously improving the power of the lowest without overfitting everything to the lowest on a minimal scale; while simultaneously reducing the overall effect of the highest accountable token. This assists with overfitting everything based on generic flood linear, while allowing training on much smaller captions, without completely obliterating the entire structure of the T5-small in less than a few thousand steps.
Additionally, the highest, and lowest tokens are automatically weighted up or down; and once a token is masked, it automatically rescales the structure around the variant being masked and attended to.
This over time will teach how to predict variant caption middle-curve portions; allowing it to eventually conform to the normalization of the new "middle-ground" prediction purposed to swap elements from one caption to another. This should allow a very high learn rate, without a complete destruction of the T5-small; due to the various regularization techniques engineered for this task today.
This process will tap into similar training methods applied to the T5 small originally, as well as introducing newer methods designed and implemented on much larger scale AI training models - with weighted substructures meant to not rip the arms and legs off of this little model.
It won't do exactly what I wanted to do yet, but that's where the high complexity captions will come into play. They are based entirely on compartmentalized sub-sectioning the critical systems into usable ways - for example;
example prompt; a room of tacos
potential goal; a brightly lit room completely filled with many tacos