What is the model architecture?

#2
by ewre324 - opened

Hello, nice to see such small model. I was wondering if the model is based on Llama/TinyLlama architecture?
Also can the authors please provide the steps taken to train the model?

BEEspoke Data org

Hi! Thanks for the kind words. You can find the model arch on the base model’s page: https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA

It’s a ”smol” version of the llama2 architecture with 1024 hidden size. We finetuned the model from the base pretrain with axolotl: https://openaccess-ai-collective.github.io/axolotl/

Sign up or log in to comment