|
--- |
|
license: mit |
|
datasets: |
|
- tiiuae/falcon-refinedweb |
|
- HuggingFaceFW/fineweb |
|
base_model: |
|
- cckm/tinymistral_950m |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
library_name: PyTorch |
|
--- |
|
|
|
## A deep and narrow Mistral model (950M params) |
|
This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications. |
|
|
|
It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning. |
|
|
|
LM Harness numbers: |
|
| Benchmark | Result | |
|
| ----- | ----- | |
|
| arc_c | 0.2884 | |
|
| arc_e | 0.5139 | |
|
| boolq | 0.6089 | |
|
| hellaswag | 0.5888 | |
|
| obqa | 0.3280 | |
|
| piqa | 0.7388 | |
|
| siqa | 0.4038 | |
|
| wino | 0.5627 | |