tinymistral_950m / README.md
cckm's picture
Update README.md
b8ac79e verified
metadata
license: mit
datasets:
  - tiiuae/falcon-refinedweb
  - HuggingFaceFW/fineweb
base_model:
  - cckm/tinymistral_950m
language:
  - en
pipeline_tag: text-generation
library_name: PyTorch

A deep and narrow Mistral model (950M params)

This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [blog post]. It is meant for edge applications.

It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.

LM Harness numbers:

Benchmark Result
arc_c 0.2884
arc_e 0.5139
boolq 0.6089
hellaswag 0.5888
obqa 0.3280
piqa 0.7388
siqa 0.4038
wino 0.5627