did Mixtral start from Mistral or from-scratch?

#64

by DaehanKim - opened Dec 27, 2023

Discussion

DaehanKim

Dec 27, 2023

•

edited Dec 27, 2023

Hi Minstral Team,

Thank you for sharing this excellent piece of models.

I just saw a paper from Upstage AI which uses minstral 7B weights to make 10B parameter model.
and they compared it to Mixture-of-Expert architecture. I'm not sure whether Mixtral(minstral 7Bx8) was trained from scratch or from pretrained weights like mistral 7B, since it's easily adoptable. In the blog post, it says "pre-trained on data extracted from the open Web".

Thanks in advance!

DaehanKim changed discussion title from did Mixtral start from Minstral or from-scratch? to did Mixtral start from Mistral or from-scratch? Dec 27, 2023

basujindal

Dec 27, 2023

"Exciting times with the new Mixtral model from @MistralAI
! It’s evident that they’ve fine-tuned the Mistral 7B model to an impressive 8x. The significant correlation between the weights of the two models is a testament to the successful reuse of models. This approach could empower the OSS community with its own robust MoE!"

https://twitter.com/tianle_cai/status/1734188749117153684

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment