did Mixtral start from Mistral or from-scratch?
Hi Minstral Team,
Thank you for sharing this excellent piece of models.
I just saw a paper from Upstage AI which uses minstral 7B weights to make 10B parameter model.
and they compared it to Mixture-of-Expert architecture. I'm not sure whether Mixtral(minstral 7Bx8) was trained from scratch or from pretrained weights like mistral 7B, since it's easily adoptable. In the blog post, it says "pre-trained on data extracted from the open Web".
Thanks in advance!
"Exciting times with the new Mixtral model from @MistralAI
! It’s evident that they’ve fine-tuned the Mistral 7B model to an impressive 8x. The significant correlation between the weights of the two models is a testament to the successful reuse of models. This approach could empower the OSS community with its own robust MoE!"