--- license: mit datasets: - tiiuae/falcon-refinedweb - HuggingFaceFW/fineweb base_model: - cckm/tinymistral_950m language: - en pipeline_tag: text-generation library_name: PyTorch --- ## A deep and narrow Mistral model (950M params) This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications. It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning. LM Harness numbers: | Benchmark | Result | | ----- | ----- | | arc_c | 0.2884 | | arc_e | 0.5139 | | boolq | 0.6089 | | hellaswag | 0.5888 | | obqa | 0.3280 | | piqa | 0.7388 | | siqa | 0.4038 | | wino | 0.5627 |