cckm
/

tinymistral_950m

Text Generation

Model card Files Files and versions Community

cckm commited on 26 days ago

Commit

562f9ec

·

verified ·

1 Parent(s): 10e0e83

Upload README.md

Files changed (1) hide show

README.md +27 -3

README.md CHANGED Viewed

@@ -1,3 +1,27 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- tiiuae/falcon-refinedweb
+- HuggingFaceFW/fineweb
+language:
+- en
+pipeline_tag: text-generation
+library_name: PyTorch
+---
+## A deep and narrow, Mistral model (950M params)
+This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications.
+It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.
+LM Harness numbers:
+| Benchmark | Result |
+| ----- | ----- |
+| arc_c | 0.2884 |
+| arc_e | 0.5139 |
+| boolq | 0.6089 |
+| hellaswag | 0.5888 |
+| obqa | 0.3280 |
+| piqa | 0.7388 |
+| siqa | 0.4038 |
+| wino | 0.5627 |