Inquiry About Phi-3 Pre-Training Dataset Composition
#103
by
Zieksy
- opened
I’m reaching out to ask if there’s any shared information available regarding the pre-training datasets used in Phi-3 to better understand the great model. I would greatly appreciate it if you could provide details on either some specific datasets involved, or the general categories of data and their approximate ratios, or any guidance on how to reproduce the pre-training dataset.
Thank you in advance for any insights or resources you can share!
i am also looking for it any lead?