WillHeld commited on
Commit
c257af7
·
verified ·
1 Parent(s): 64eb300

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -166,13 +166,13 @@ marin = AutoModelForCausalLM.from_pretrained("marin-community/marin-8b-base", re
166
  We ran a suite of standard benchmarks to compare our model with [Llama 3.1 8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), and the open source 7-8B models [Olmo 2 7B](https://huggingface.co/allenai/OLMo-2-1124-7B), and [MAP NEO 7B](https://huggingface.co/m-a-p/neo_7b).
167
  For all benchmarks, we used [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) with the default setup for each task. (These numbers may differ from reported results due to differences in setup. LM Eval Harness is usually somewhat stricter than other harnesses.)
168
 
169
- | | Average | AGI Eval LSAT-AR | ARC Easy | ARC Challenge | BBH | BoolQ | CommonSense QA | COPA | GPQA | HellaSwag 0-shot | HellaSwag 10-shot | lambada_openai | MMLU 5-shot | MMLU 0-shot | MMLU Pro |OpenBookQA | PIQA | WinoGrande | WSC |
170
- |--------------------------------------|----------|------------------|----------|---------------|----------|----------|----------------|----------|----------|------------------|-------------------|----------------|--------------|-------------|----------|-----------|----------|------------|----------|
171
- | Marin 8B Base <br/>(Deeper Starling) | **68.3** | 20.9 | **86.5** | **63.1** | **50.6** | **85.9** | 79.1 | **92.0** | 30.3 | **82.3** | **83.6** | **74.7** | **67.6** | **65.9** | **36.5** |44.2 | **84.4** | **74.5** | 82.1 |
172
- | Llama 3.1 Base | 67.0 | 20.4 | 85.8 | 58.9 | 46.4 | 84.2 | 75.2 | **92.0** | **32.3** | 79.4 | 81.9 | **74.7** | 66.4 | 65.5 | 33.3 |45.8 | 82.9 | 74.4 | 83.5 |
173
- | OLMo 2 Base | 66.7 | 17.4 | 85.0 | 60.7 | 44.4 | 85.5 | 75.4 | 89.0 | 26.8 | 80.5 | 81.7 | 73.1 | 63.9 | 61.9 | 30.6 |**46.2** | 82.5 | 74.3 | **86.1** |
174
- | MAP NEO 7B | 62.2 | **23.0** | 81.1 | 52.0 | 42.4 | 84.7 | **81.7** | 82.0 | 27.8 | 72.5 | 73.3 | 64.6 | 58.2 | 56.4 | 25.2 |39.4 | 79.0 | 66.1 | 73.3 |
175
-
176
 
177
  Marin 8B Base fares well on most of these tasks.
178
 
 
166
  We ran a suite of standard benchmarks to compare our model with [Llama 3.1 8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), and the open source 7-8B models [Olmo 2 7B](https://huggingface.co/allenai/OLMo-2-1124-7B), and [MAP NEO 7B](https://huggingface.co/m-a-p/neo_7b).
167
  For all benchmarks, we used [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) with the default setup for each task. (These numbers may differ from reported results due to differences in setup. LM Eval Harness is usually somewhat stricter than other harnesses.)
168
 
169
+ | Model | Average | AGI Eval LSAT-AR | ARC Challenge | ARC Easy | BBH | BoolQ | CommonSense QA | COPA | GPQA | GSM8K | HellaSwag_1, 10 shot | HellaSwag, 0 shot | lambada_openai | MMLU Pro | MMLU_5shot | MMLU-0shot | OpenBookQA | PIQA | WinoGrande | WSC |
170
+ |-------|---------|-----------------|---------------|----------|-----|-------|----------------|------|------|-------|---------------------|------------------|---------------|----------|------------|------------|-----------|------|------------|-----|
171
+ | Marin 8B Base <br/>(Deeper Starling) | **66.6** | 20.9 | **63.1** | **86.5** | **50.6** | **85.9** | 79.1 | **92.0** | 30.3 | 61.3 | **83.6** | **82.3** | **74.7** | **36.5** | **67.6** | **65.9** | 44.2 | **84.4** | **74.5** | 82.1 |
172
+ | Llama 3.1 Base | 65.3 | 20.4 | 58.9 | 85.8 | 46.4 | 84.2 | 75.2 | **92.0** | **32.3** | 56.8 | 81.9 | 79.4 | **74.7** | 33.3 | 66.4 | 65.5 | 45.8 | 82.9 | 74.4 | 83.5 |
173
+ | OLMo 2 Base | 64.9 | 17.4 | 60.7 | 85.0 | 44.4 | 85.5 | 75.4 | 89.0 | 26.8 | **67.6** | 81.7 | 80.5 | 73.1 | 30.6 | 63.9 | 61.9 | **46.2** | 82.5 | 74.3 | **86.1** |
174
+ | MAP NEO 7B | 59.5 | **23.0** | 52.0 | 81.1 | 42.4 | 84.7 | **81.7** | 82.0 | 27.8 | 48.0 | 73.3 | 72.5 | 64.6 | 25.2 | 58.2 | 56.4 | 39.4 | 79.0 | 66.1 | 73.3 |
175
+ | Amber 7B | 48.1 | 19.1 | 41.6 | 74.7 | 31.6 | 68.8 | 20.6 | 87.0 | 26.3 | 4.4 | 73.9 | 72.4 | 66.8 | 11.6 | 26.6 | 26.7 | 39.2 | 79.8 | 65.3 | 76.9 |
176
 
177
  Marin 8B Base fares well on most of these tasks.
178