README.md · cckm/tinymistral

metadata

license: mit
datasets:
  - tiiuae/falcon-refinedweb
  - HuggingFaceFW/fineweb
base_model:
  - cckm/tinymistral_950m
language:
  - en
pipeline_tag: text-generation
library_name: PyTorch

A deep and narrow Mistral model (950M params)

This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [blog post]. It is meant for edge applications.

It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.

LM Harness numbers:

Benchmark	Result
arc_c	0.2884
arc_e	0.5139
boolq	0.6089
hellaswag	0.5888
obqa	0.3280
piqa	0.7388
siqa	0.4038
wino	0.5627