You should concider doing Mistral Nemo with the same dataset.

#1
by UniversalLove333 - opened

I tried 12+ fine-tunes of small, they're all the same...
I could never get Mistral small to work properly, it's not as creative or coherent as Nemo.
Small feels rigged and stiff with what it wants to generate, same as Gemma 3.

Nemo feels much more flexible and easier to work with, same as Llama 3 - 3.3.

Sure, it can be considered if I find a source of spare compute.

I've also considered taking a crack at largestral or command-a but I don't think those have the quality to justify the cost over using L3.3 - unfortunately not a lot of movement in the large-but-not-ginormous open weights model space lol.

Your old Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss is how I found you, that one was a killer!!
I don't know anything about training, but IME, If the default model is good with writing and creativity, the fine-tunes are usually good also. Like, default nemo, llama, and maybe Gemma 2, Qwen 2 - QWQ are good and creative. But I could never get Gemma 3, Qwen 3, or Small to work... They're all linear and strict with what they want to generate.

Sign up or log in to comment