metadata
license: mit
datasets:
- tiiuae/falcon-refinedweb
- HuggingFaceFW/fineweb
base_model:
- cckm/tinymistral_950m
language:
- en
pipeline_tag: text-generation
library_name: PyTorch
A deep and narrow Mistral model (950M params)
This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [blog post]. It is meant for edge applications.
It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.
LM Harness numbers:
Benchmark | Result |
---|---|
arc_c | 0.2884 |
arc_e | 0.5139 |
boolq | 0.6089 |
hellaswag | 0.5888 |
obqa | 0.3280 |
piqa | 0.7388 |
siqa | 0.4038 |
wino | 0.5627 |