Link to the code + set-up used?
Hi! Really awesome project, and congrats on only dropping a couple of points to 76~77 MMLU! π₯
Better than Claude Haiku, Mixtral and even trading blows with Bigxtral at 42b is wild π€―
I had a pipe dream of doing something similar to this, but with the 8b version to make a 4~5b Llama-3. Kind of ambitious since I'd be hoping to do it in MLX. Was also hoping to document the whole thing into a Jupyter Notebook to open-source it to the community. Is there any chance you could share the code / pipeline you used for the layer selection and pruning?
Also eagerly awaiting to see if you try the same out on the 70b-instruct!
Hi! I agree that it is an awesome project :)
I would also love to see something like this on the instruction tuned model!!!
@mark-arts
have you found anything in that regard since when you posted this comment?
Thanks in advance