Query about Base Model

by Tangchiu - opened 2 days ago

Hi LeDissolution,

I found your model StatSuite_G2B_Alpha wonderful! I’m looking into the training recipe of this model to better understand its behavioral alignment.

Is the base model ibm-granite/granite-3.1-2b-base and could some hyperparameters during training be shared?

Would you mind sharing some insights here, or perhaps providing an email address for a more detailed technical discussion?

Thanks for your contribution to the community!

LeDissolution

Owner about 9 hours ago

•

edited about 9 hours ago

Hi,

It is, indeed, based on granite-3.1-2b-base (although there is a newer version based on their new 1.6B that proved to be about the same or arguably better performance in a 33% smaller footprint).

I'm not sure what exact hyperparameters were used for that particular version of the model (theres been few dozens of iterations), but all the training code with configs can be found here https://github.com/leDissolution/TrainingPipeline

You can mail me at unstoppable_dissolution@proton.me if you want

(the project has been basically on hold for some months now due to me wallowing in my depression more and more, but I hope to gather enough energy to get back to it sooner rather than later - anything smaller than big glm still cant into scene continuity, despite my hopes for advances)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment