Upload README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,27 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- tiiuae/falcon-refinedweb
|
5 |
+
- HuggingFaceFW/fineweb
|
6 |
+
language:
|
7 |
+
- en
|
8 |
+
pipeline_tag: text-generation
|
9 |
+
library_name: PyTorch
|
10 |
+
---
|
11 |
+
|
12 |
+
## A deep and narrow, Mistral model (950M params)
|
13 |
+
This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications.
|
14 |
+
|
15 |
+
It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.
|
16 |
+
|
17 |
+
LM Harness numbers:
|
18 |
+
| Benchmark | Result |
|
19 |
+
| ----- | ----- |
|
20 |
+
| arc_c | 0.2884 |
|
21 |
+
| arc_e | 0.5139 |
|
22 |
+
| boolq | 0.6089 |
|
23 |
+
| hellaswag | 0.5888 |
|
24 |
+
| obqa | 0.3280 |
|
25 |
+
| piqa | 0.7388 |
|
26 |
+
| siqa | 0.4038 |
|
27 |
+
| wino | 0.5627 |
|