Update README.md
Browse files
README.md
CHANGED
@@ -3,13 +3,15 @@ license: mit
|
|
3 |
datasets:
|
4 |
- tiiuae/falcon-refinedweb
|
5 |
- HuggingFaceFW/fineweb
|
|
|
|
|
6 |
language:
|
7 |
- en
|
8 |
pipeline_tag: text-generation
|
9 |
library_name: PyTorch
|
10 |
---
|
11 |
|
12 |
-
## A deep and narrow
|
13 |
This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications.
|
14 |
|
15 |
It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.
|
|
|
3 |
datasets:
|
4 |
- tiiuae/falcon-refinedweb
|
5 |
- HuggingFaceFW/fineweb
|
6 |
+
base_model:
|
7 |
+
- cckm/tinymistral_950m
|
8 |
language:
|
9 |
- en
|
10 |
pipeline_tag: text-generation
|
11 |
library_name: PyTorch
|
12 |
---
|
13 |
|
14 |
+
## A deep and narrow Mistral model (950M params)
|
15 |
This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications.
|
16 |
|
17 |
It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.
|