Query about Base Model
Hi LeDissolution,
I found your model StatSuite_G2B_Alpha wonderful! I’m looking into the training recipe of this model to better understand its behavioral alignment.
Is the base model ibm-granite/granite-3.1-2b-base and could some hyperparameters during training be shared?
Would you mind sharing some insights here, or perhaps providing an email address for a more detailed technical discussion?
Thanks for your contribution to the community!
Hi,
It is, indeed, based on granite-3.1-2b-base (although there is a newer version based on their new 1.6B that proved to be about the same or arguably better performance in a 33% smaller footprint).
I'm not sure what exact hyperparameters were used for that particular version of the model (theres been few dozens of iterations), but all the training code with configs can be found here https://github.com/leDissolution/TrainingPipeline
You can mail me at unstoppable_dissolution@proton.me if you want
(the project has been basically on hold for some months now due to me wallowing in my depression more and more, but I hope to gather enough energy to get back to it sooner rather than later - anything smaller than big glm still cant into scene continuity, despite my hopes for advances)