dlwh commited on
Commit
64eb300
·
verified ·
1 Parent(s): d57287a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -22
README.md CHANGED
@@ -13,6 +13,7 @@ datasets:
13
  - marin-community/datashop-science-qa
14
  - marin-community/stackexchange-markdown
15
  - marin-community/wikipedia-markdown
 
16
  language:
17
  - en
18
  tags:
@@ -83,25 +84,32 @@ We release a large number of checkpoints.
83
 
84
  Main Page: [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base)
85
 
86
- (More checkpoints are being uploaded right now.)
87
-
88
- | Name | Training Tokens | Link |
89
- |------|--------|-------------|
90
- | `deeper-starling` | 13.7T | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/deeper-starling) |
91
-
92
- `main` currently refers to `deeper-starling`. This may change in the future, though we will maintain model compatibility. If you require a specific checkpoint, please use the `revision` argument.
 
 
 
 
 
 
 
93
 
94
  ### Instruct Model Checkpoints
95
 
96
  Main Page: [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct)
97
 
98
- | Name | Training Tokens | Link |
99
- |------|--------|-------------|
100
- | `deeper-starling-05-15` | 5.3B | [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct/) |
 
101
 
102
  `main` currently refers to `deeper-starling-05-15`. This may change in the future, though we will maintain model compatibility. If you require a specific checkpoint, please use the `revision` argument.
103
 
104
-
105
  ## Installation
106
 
107
  Marin 8B uses the [Llama architecture](https://arxiv.org/abs/2302.13971) and as such should
@@ -158,15 +166,15 @@ marin = AutoModelForCausalLM.from_pretrained("marin-community/marin-8b-base", re
158
  We ran a suite of standard benchmarks to compare our model with [Llama 3.1 8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), and the open source 7-8B models [Olmo 2 7B](https://huggingface.co/allenai/OLMo-2-1124-7B), and [MAP NEO 7B](https://huggingface.co/m-a-p/neo_7b).
159
  For all benchmarks, we used [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) with the default setup for each task. (These numbers may differ from reported results due to differences in setup. LM Eval Harness is usually somewhat stricter than other harnesses.)
160
 
161
- | | Average | AGI Eval LSAT-AR | ARC Easy | ARC Challenge | BBH | BoolQ | CommonSense QA | COPA | GPQA | HellaSwag 0-shot | HellaSwag 10-shot | lambada_openai | MMLU 5-shot | MMLU 0-shot | MMLU Pro | OpenBookQA | PIQA | WinoGrande | WSC | GSM8K |
162
- |--------------------------|----------|------------------|----------|---------------|----------|----------|----------------|----------|----------|------------------|-------------------|----------------|-------------|-------------|----------|-----------|----------|------------|----------|----------|
163
- | Marin 8B Base (Starling) | **66.6** | 20.9 | **86.5** | **63.1** | **50.6** | **85.9** | 79.1 | **92.0** | 30.3 | **82.3** | **83.6** | **74.7** | **67.6** | **65.9** | **36.5** | 44.2 | **84.4** | **74.5** | 82.1 | 61.3 |
164
- | Llama 3.1 Base | 65.3 | 20.4 | 85.8 | 58.9 | 46.4 | 84.2 | 75.2 | **92.0** | **32.3** | 79.4 | 81.9 | **74.7** | 66.4 | 65.5 | 33.3 | 45.8 | 82.9 | 74.4 | 83.5 | 56.8 |
165
- | OLMo 2 Base | 64.9 | 17.4 | 85.0 | 60.7 | 44.4 | 85.5 | 75.4 | 89.0 | 26.8 | 80.5 | 81.7 | 73.1 | 63.9 | 61.9 | 30.6 | **46.2** | 82.5 | 74.3 | **86.1** | **67.6** |
166
- | MAP NEO 7B | 59.5 | **23.0** | 81.1 | 52.0 | 42.4 | 84.7 | **81.7** | 82.0 | 27.8 | 72.5 | 73.3 | 64.6 | 58.2 | 56.4 | 25.2 | 39.4 | 79.0 | 66.1 | 73.3 | 48.0 |
167
 
168
 
169
- Marin 8B Base fares well on most tasks.
170
 
171
 
172
  ## Model Details
@@ -194,8 +202,8 @@ Marin 8B uses a variant of the Llama 3 tokenizer: [stanford-crfm/marin-tokenizer
194
  - *Ocelot (DCLM WSD Phase)*: Increased batch size, using WSD. (2.7T->3.78T tokens)
195
  - *Jellyfish (First Cooldown)*: Higher quality data (~Dolmino+Fine Math). (3.78T->4.78T tokens)
196
  - *Phoenix (Reheated)*: Rapid rewarming + [Nemotron-CC](https://arxiv.org/abs/2412.02595) (plus [Starcoder](https://huggingface.co/datasets/bigcode/starcoderdata)). (4.78T->11.1T tokens)
197
- - *Starling (Second Cooldown)*: Another cooldown. We followed a similar process to the first cooldown, but added a few new datasets. (11.1T->12.75T tokens)
198
- - *Deeper Starling*: Somewhat more pretraining. (12.75T->13.7T tokens)
199
 
200
  All released pre-training checkpoints except Kestrel use an exponential moving average of the model weights.
201
 
@@ -216,5 +224,3 @@ For errors in this model card, please open an issue in this repository. For tech
216
  ## Acknowledgements
217
 
218
  The compute for this model was generously provided by Google's [TPU Research Cloud](https://sites.research.google/trc/about/).
219
-
220
- (We based this model card on Olmo 2's.)
 
13
  - marin-community/datashop-science-qa
14
  - marin-community/stackexchange-markdown
15
  - marin-community/wikipedia-markdown
16
+ # REMINDER: when the instruct model should add dependencies on the instruct datasets and the base model.
17
  language:
18
  - en
19
  tags:
 
84
 
85
  Main Page: [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base)
86
 
87
+ | Name | Training Tokens | Link |
88
+ |-------------------|-----------------|------------------------------------------------------------------------------------------------------------|
89
+ | `main` | 12.7T | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/main) |
90
+ | `kestrel` | 2.7T | [kestrel](https://huggingface.co/marin-community/marin-8b-base/tree/kestrel) |
91
+ | `ocelot` | 3.78T | [kestrel](https://huggingface.co/marin-community/marin-8b-base/tree/ocelot) |
92
+ | `jellyfish` | 4.78T | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/jellyfish) |
93
+ | `phoenix` | 11.1T | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/phoenix) |
94
+ | `starling` | 12.4T | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/starling) |
95
+ | `deeper-starling` | 12.7T | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/deeper-starling) |
96
+
97
+ `main` currently refers to `deeper-starling`. main currently refers to deeper-starling.
98
+ This may change in the future, but we will maintain compatibility at the architecture and tokenizer level,
99
+ so the model will remain drop-in compatible with existing tooling.
100
+ If you require a specific checkpoint, please use the `revision` argument.
101
 
102
  ### Instruct Model Checkpoints
103
 
104
  Main Page: [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct)
105
 
106
+ | Name | SFT Tokens | Link |
107
+ |-------------------------|------------|--------------------------------------------------------------------------------------------------------------------------|
108
+ | `main` | 5.3B | [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct/tree/deeper-starling-05-15) |
109
+ | `deeper-starling-05-15` | 5.3B | [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct/tree/deeper-starling-05-15) |
110
 
111
  `main` currently refers to `deeper-starling-05-15`. This may change in the future, though we will maintain model compatibility. If you require a specific checkpoint, please use the `revision` argument.
112
 
 
113
  ## Installation
114
 
115
  Marin 8B uses the [Llama architecture](https://arxiv.org/abs/2302.13971) and as such should
 
166
  We ran a suite of standard benchmarks to compare our model with [Llama 3.1 8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), and the open source 7-8B models [Olmo 2 7B](https://huggingface.co/allenai/OLMo-2-1124-7B), and [MAP NEO 7B](https://huggingface.co/m-a-p/neo_7b).
167
  For all benchmarks, we used [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) with the default setup for each task. (These numbers may differ from reported results due to differences in setup. LM Eval Harness is usually somewhat stricter than other harnesses.)
168
 
169
+ | | Average | AGI Eval LSAT-AR | ARC Easy | ARC Challenge | BBH | BoolQ | CommonSense QA | COPA | GPQA | HellaSwag 0-shot | HellaSwag 10-shot | lambada_openai | MMLU 5-shot | MMLU 0-shot | MMLU Pro |OpenBookQA | PIQA | WinoGrande | WSC |
170
+ |--------------------------------------|----------|------------------|----------|---------------|----------|----------|----------------|----------|----------|------------------|-------------------|----------------|--------------|-------------|----------|-----------|----------|------------|----------|
171
+ | Marin 8B Base <br/>(Deeper Starling) | **68.3** | 20.9 | **86.5** | **63.1** | **50.6** | **85.9** | 79.1 | **92.0** | 30.3 | **82.3** | **83.6** | **74.7** | **67.6** | **65.9** | **36.5** |44.2 | **84.4** | **74.5** | 82.1 |
172
+ | Llama 3.1 Base | 67.0 | 20.4 | 85.8 | 58.9 | 46.4 | 84.2 | 75.2 | **92.0** | **32.3** | 79.4 | 81.9 | **74.7** | 66.4 | 65.5 | 33.3 |45.8 | 82.9 | 74.4 | 83.5 |
173
+ | OLMo 2 Base | 66.7 | 17.4 | 85.0 | 60.7 | 44.4 | 85.5 | 75.4 | 89.0 | 26.8 | 80.5 | 81.7 | 73.1 | 63.9 | 61.9 | 30.6 |**46.2** | 82.5 | 74.3 | **86.1** |
174
+ | MAP NEO 7B | 62.2 | **23.0** | 81.1 | 52.0 | 42.4 | 84.7 | **81.7** | 82.0 | 27.8 | 72.5 | 73.3 | 64.6 | 58.2 | 56.4 | 25.2 |39.4 | 79.0 | 66.1 | 73.3 |
175
 
176
 
177
+ Marin 8B Base fares well on most of these tasks.
178
 
179
 
180
  ## Model Details
 
202
  - *Ocelot (DCLM WSD Phase)*: Increased batch size, using WSD. (2.7T->3.78T tokens)
203
  - *Jellyfish (First Cooldown)*: Higher quality data (~Dolmino+Fine Math). (3.78T->4.78T tokens)
204
  - *Phoenix (Reheated)*: Rapid rewarming + [Nemotron-CC](https://arxiv.org/abs/2412.02595) (plus [Starcoder](https://huggingface.co/datasets/bigcode/starcoderdata)). (4.78T->11.1T tokens)
205
+ - *Starling (Second Cooldown)*: Another cooldown. We followed a similar process to the first cooldown, but added a few new datasets. (11.1T->12.4 tokens)
206
+ - *Deeper Starling*: Somewhat more pretraining. (12.4->12.7T tokens)
207
 
208
  All released pre-training checkpoints except Kestrel use an exponential moving average of the model weights.
209
 
 
224
  ## Acknowledgements
225
 
226
  The compute for this model was generously provided by Google's [TPU Research Cloud](https://sites.research.google/trc/about/).