A tour of 14B finetuning

#1
by sometimesanotion - opened

You have done some finetunes on a variety of 14B architectures, and joining a trend of starting with Virtuoso, Lamarck, and Qwenvergence - a combination I am enjoying too! Did you notice anything about the amount of finetuning required to get high and stable performance from the various tries?

hi @sometimesanotion yes indeed I'm starting to explore model merging ( that it gives impressive results!). As for fine tuning what I see for the moment is that it takes few training steps, between 200 and 1k DPO steps to keep something efficient. Anyway I'm still continuing my experiments ;)

very impressive also for german language ... may you can push it with a tuning more ;)

thanx @kalle07 ! Glad it works well on German language too. Yes I'm working on a 2.1 version to improve it further ;)

Sign up or log in to comment