I have had issues when using just three loss perimeters I am not sure how you have it set up, but you might consider calculating which are convergent and which are divergent and update the gradient in the direction you wish to go, where you have multiple agree experts or multiple disagreeing experts and you choose to "side" with either the multiple experts or the divergent.
User Name
Felldude
AI & ML interests
Photo restoration, ML, 3D modeling, Reverse Engineering
Recent Activity
replied to AbstractPhil's
post about 2 hours ago
geolip-captionbert-8192
This bert is currently being distilled using 5 bert teachers using the conceptual captions dataset. The recall accuracy is based on the whitened procrustes alignment, and the losses reflect keeping that rotation aligned correctly.
The expectation from the smaller prototypes show this model will align to 100% accuracy recall based on the most optimal opinions based on the correct answer, aligning specifically to the correct answers in conjunction with all the geometric losses.
No joke, this may be the smallest, least computation, most accurate, and fastest bert I've trained thus far - and it will be based entirely on five teachers simultaneously feeding opinions through a relay hub. updated
a model 3 days ago
Felldude/Qwen3-VL-8B-Instruct-Uncensored published
a model 4 days ago
Felldude/Qwen3-VL-8B-Instruct-Uncensored