tinymistral_950m / README.md
cckm's picture
Update README.md
b8ac79e verified
---
license: mit
datasets:
- tiiuae/falcon-refinedweb
- HuggingFaceFW/fineweb
base_model:
- cckm/tinymistral_950m
language:
- en
pipeline_tag: text-generation
library_name: PyTorch
---
## A deep and narrow Mistral model (950M params)
This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications.
It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.
LM Harness numbers:
| Benchmark | Result |
| ----- | ----- |
| arc_c | 0.2884 |
| arc_e | 0.5139 |
| boolq | 0.6089 |
| hellaswag | 0.5888 |
| obqa | 0.3280 |
| piqa | 0.7388 |
| siqa | 0.4038 |
| wino | 0.5627 |