Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,7 @@ datasets:
|
|
13 |
- marin-community/datashop-science-qa
|
14 |
- marin-community/stackexchange-markdown
|
15 |
- marin-community/wikipedia-markdown
|
|
|
16 |
language:
|
17 |
- en
|
18 |
tags:
|
@@ -83,25 +84,32 @@ We release a large number of checkpoints.
|
|
83 |
|
84 |
Main Page: [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base)
|
85 |
|
86 |
-
|
87 |
-
|
88 |
-
|
|
89 |
-
|
90 |
-
| `
|
91 |
-
|
92 |
-
`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
93 |
|
94 |
### Instruct Model Checkpoints
|
95 |
|
96 |
Main Page: [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct)
|
97 |
|
98 |
-
| Name
|
99 |
-
|
100 |
-
| `
|
|
|
101 |
|
102 |
`main` currently refers to `deeper-starling-05-15`. This may change in the future, though we will maintain model compatibility. If you require a specific checkpoint, please use the `revision` argument.
|
103 |
|
104 |
-
|
105 |
## Installation
|
106 |
|
107 |
Marin 8B uses the [Llama architecture](https://arxiv.org/abs/2302.13971) and as such should
|
@@ -158,15 +166,15 @@ marin = AutoModelForCausalLM.from_pretrained("marin-community/marin-8b-base", re
|
|
158 |
We ran a suite of standard benchmarks to compare our model with [Llama 3.1 8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), and the open source 7-8B models [Olmo 2 7B](https://huggingface.co/allenai/OLMo-2-1124-7B), and [MAP NEO 7B](https://huggingface.co/m-a-p/neo_7b).
|
159 |
For all benchmarks, we used [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) with the default setup for each task. (These numbers may differ from reported results due to differences in setup. LM Eval Harness is usually somewhat stricter than other harnesses.)
|
160 |
|
161 |
-
|
|
162 |
-
|
163 |
-
| Marin 8B Base (Starling) | **
|
164 |
-
| Llama 3.1 Base
|
165 |
-
| OLMo 2 Base
|
166 |
-
| MAP NEO 7B
|
167 |
|
168 |
|
169 |
-
Marin 8B Base fares well on most tasks.
|
170 |
|
171 |
|
172 |
## Model Details
|
@@ -194,8 +202,8 @@ Marin 8B uses a variant of the Llama 3 tokenizer: [stanford-crfm/marin-tokenizer
|
|
194 |
- *Ocelot (DCLM WSD Phase)*: Increased batch size, using WSD. (2.7T->3.78T tokens)
|
195 |
- *Jellyfish (First Cooldown)*: Higher quality data (~Dolmino+Fine Math). (3.78T->4.78T tokens)
|
196 |
- *Phoenix (Reheated)*: Rapid rewarming + [Nemotron-CC](https://arxiv.org/abs/2412.02595) (plus [Starcoder](https://huggingface.co/datasets/bigcode/starcoderdata)). (4.78T->11.1T tokens)
|
197 |
-
- *Starling (Second Cooldown)*: Another cooldown. We followed a similar process to the first cooldown, but added a few new datasets. (11.1T->12.
|
198 |
-
- *Deeper Starling*: Somewhat more pretraining. (12.
|
199 |
|
200 |
All released pre-training checkpoints except Kestrel use an exponential moving average of the model weights.
|
201 |
|
@@ -216,5 +224,3 @@ For errors in this model card, please open an issue in this repository. For tech
|
|
216 |
## Acknowledgements
|
217 |
|
218 |
The compute for this model was generously provided by Google's [TPU Research Cloud](https://sites.research.google/trc/about/).
|
219 |
-
|
220 |
-
(We based this model card on Olmo 2's.)
|
|
|
13 |
- marin-community/datashop-science-qa
|
14 |
- marin-community/stackexchange-markdown
|
15 |
- marin-community/wikipedia-markdown
|
16 |
+
# REMINDER: when the instruct model should add dependencies on the instruct datasets and the base model.
|
17 |
language:
|
18 |
- en
|
19 |
tags:
|
|
|
84 |
|
85 |
Main Page: [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base)
|
86 |
|
87 |
+
| Name | Training Tokens | Link |
|
88 |
+
|-------------------|-----------------|------------------------------------------------------------------------------------------------------------|
|
89 |
+
| `main` | 12.7T | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/main) |
|
90 |
+
| `kestrel` | 2.7T | [kestrel](https://huggingface.co/marin-community/marin-8b-base/tree/kestrel) |
|
91 |
+
| `ocelot` | 3.78T | [kestrel](https://huggingface.co/marin-community/marin-8b-base/tree/ocelot) |
|
92 |
+
| `jellyfish` | 4.78T | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/jellyfish) |
|
93 |
+
| `phoenix` | 11.1T | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/phoenix) |
|
94 |
+
| `starling` | 12.4T | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/starling) |
|
95 |
+
| `deeper-starling` | 12.7T | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/deeper-starling) |
|
96 |
+
|
97 |
+
`main` currently refers to `deeper-starling`. main currently refers to deeper-starling.
|
98 |
+
This may change in the future, but we will maintain compatibility at the architecture and tokenizer level,
|
99 |
+
so the model will remain drop-in compatible with existing tooling.
|
100 |
+
If you require a specific checkpoint, please use the `revision` argument.
|
101 |
|
102 |
### Instruct Model Checkpoints
|
103 |
|
104 |
Main Page: [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct)
|
105 |
|
106 |
+
| Name | SFT Tokens | Link |
|
107 |
+
|-------------------------|------------|--------------------------------------------------------------------------------------------------------------------------|
|
108 |
+
| `main` | 5.3B | [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct/tree/deeper-starling-05-15) |
|
109 |
+
| `deeper-starling-05-15` | 5.3B | [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct/tree/deeper-starling-05-15) |
|
110 |
|
111 |
`main` currently refers to `deeper-starling-05-15`. This may change in the future, though we will maintain model compatibility. If you require a specific checkpoint, please use the `revision` argument.
|
112 |
|
|
|
113 |
## Installation
|
114 |
|
115 |
Marin 8B uses the [Llama architecture](https://arxiv.org/abs/2302.13971) and as such should
|
|
|
166 |
We ran a suite of standard benchmarks to compare our model with [Llama 3.1 8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), and the open source 7-8B models [Olmo 2 7B](https://huggingface.co/allenai/OLMo-2-1124-7B), and [MAP NEO 7B](https://huggingface.co/m-a-p/neo_7b).
|
167 |
For all benchmarks, we used [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) with the default setup for each task. (These numbers may differ from reported results due to differences in setup. LM Eval Harness is usually somewhat stricter than other harnesses.)
|
168 |
|
169 |
+
| | Average | AGI Eval LSAT-AR | ARC Easy | ARC Challenge | BBH | BoolQ | CommonSense QA | COPA | GPQA | HellaSwag 0-shot | HellaSwag 10-shot | lambada_openai | MMLU 5-shot | MMLU 0-shot | MMLU Pro |OpenBookQA | PIQA | WinoGrande | WSC |
|
170 |
+
|--------------------------------------|----------|------------------|----------|---------------|----------|----------|----------------|----------|----------|------------------|-------------------|----------------|--------------|-------------|----------|-----------|----------|------------|----------|
|
171 |
+
| Marin 8B Base <br/>(Deeper Starling) | **68.3** | 20.9 | **86.5** | **63.1** | **50.6** | **85.9** | 79.1 | **92.0** | 30.3 | **82.3** | **83.6** | **74.7** | **67.6** | **65.9** | **36.5** |44.2 | **84.4** | **74.5** | 82.1 |
|
172 |
+
| Llama 3.1 Base | 67.0 | 20.4 | 85.8 | 58.9 | 46.4 | 84.2 | 75.2 | **92.0** | **32.3** | 79.4 | 81.9 | **74.7** | 66.4 | 65.5 | 33.3 |45.8 | 82.9 | 74.4 | 83.5 |
|
173 |
+
| OLMo 2 Base | 66.7 | 17.4 | 85.0 | 60.7 | 44.4 | 85.5 | 75.4 | 89.0 | 26.8 | 80.5 | 81.7 | 73.1 | 63.9 | 61.9 | 30.6 |**46.2** | 82.5 | 74.3 | **86.1** |
|
174 |
+
| MAP NEO 7B | 62.2 | **23.0** | 81.1 | 52.0 | 42.4 | 84.7 | **81.7** | 82.0 | 27.8 | 72.5 | 73.3 | 64.6 | 58.2 | 56.4 | 25.2 |39.4 | 79.0 | 66.1 | 73.3 |
|
175 |
|
176 |
|
177 |
+
Marin 8B Base fares well on most of these tasks.
|
178 |
|
179 |
|
180 |
## Model Details
|
|
|
202 |
- *Ocelot (DCLM WSD Phase)*: Increased batch size, using WSD. (2.7T->3.78T tokens)
|
203 |
- *Jellyfish (First Cooldown)*: Higher quality data (~Dolmino+Fine Math). (3.78T->4.78T tokens)
|
204 |
- *Phoenix (Reheated)*: Rapid rewarming + [Nemotron-CC](https://arxiv.org/abs/2412.02595) (plus [Starcoder](https://huggingface.co/datasets/bigcode/starcoderdata)). (4.78T->11.1T tokens)
|
205 |
+
- *Starling (Second Cooldown)*: Another cooldown. We followed a similar process to the first cooldown, but added a few new datasets. (11.1T->12.4 tokens)
|
206 |
+
- *Deeper Starling*: Somewhat more pretraining. (12.4->12.7T tokens)
|
207 |
|
208 |
All released pre-training checkpoints except Kestrel use an exponential moving average of the model weights.
|
209 |
|
|
|
224 |
## Acknowledgements
|
225 |
|
226 |
The compute for this model was generously provided by Google's [TPU Research Cloud](https://sites.research.google/trc/about/).
|
|
|
|