smol_llama: 220M GQA
model card WIP, more details to come
A small 220M param (total) decoder model. This is the first version of the model.
- 1024 hidden size, 10 layers
- GQA (32 heads, 8 key-value), context length 2048
- train-from-scratch on one GPU :)
Links
Here are some fine-tunes we did, but there are many more possibilities out there!
- instruct
- code
- python (pypi) - link
- zephyr DPO tune
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 29.44 |
AI2 Reasoning Challenge (25-Shot) | 24.83 |
HellaSwag (10-Shot) | 29.76 |
MMLU (5-Shot) | 25.85 |
TruthfulQA (0-shot) | 44.55 |
Winogrande (5-shot) | 50.99 |
GSM8k (5-shot) | 0.68 |
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Datasets used to train blockblockblock/smol_llama-220M-GQA-bpw3
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard24.830
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard29.760
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard25.850
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard44.550
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard50.990
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard0.680