cckm
/

tinymistral_950m

Text Generation

Model card Files Files and versions Community

cckm commited on 25 days ago

Commit

b8ac79e

·

verified ·

1 Parent(s): 562f9ec

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -3,13 +3,15 @@ license: mit
 datasets:
 - tiiuae/falcon-refinedweb
 - HuggingFaceFW/fineweb
 language:
 - en
 pipeline_tag: text-generation
 library_name: PyTorch
 ---
-## A deep and narrow, Mistral model (950M params)
 This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications.
 It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.

 datasets:
 - tiiuae/falcon-refinedweb
 - HuggingFaceFW/fineweb
+base_model:
+  - cckm/tinymistral_950m
 language:
 - en
 pipeline_tag: text-generation
 library_name: PyTorch
 ---
+## A deep and narrow Mistral model (950M params)
 This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications.
 It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.