Adding Evaluation Results
#168
by
leaderboard-pr-bot
- opened
README.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- fr
|
| 5 |
- it
|
| 6 |
- de
|
| 7 |
- es
|
| 8 |
- en
|
|
|
|
| 9 |
inference:
|
| 10 |
parameters:
|
| 11 |
temperature: 0.5
|
|
@@ -13,6 +13,109 @@ widget:
|
|
| 13 |
- messages:
|
| 14 |
- role: user
|
| 15 |
content: What is your favorite condiment?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
---
|
| 17 |
# Model Card for Mixtral-8x7B
|
| 18 |
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested.
|
|
@@ -164,4 +267,17 @@ It does not have any moderation mechanisms. We're looking forward to engaging wi
|
|
| 164 |
make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
|
| 165 |
|
| 166 |
# The Mistral AI Team
|
| 167 |
-
Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
- fr
|
| 4 |
- it
|
| 5 |
- de
|
| 6 |
- es
|
| 7 |
- en
|
| 8 |
+
license: apache-2.0
|
| 9 |
inference:
|
| 10 |
parameters:
|
| 11 |
temperature: 0.5
|
|
|
|
| 13 |
- messages:
|
| 14 |
- role: user
|
| 15 |
content: What is your favorite condiment?
|
| 16 |
+
model-index:
|
| 17 |
+
- name: Mixtral-8x7B-Instruct-v0.1
|
| 18 |
+
results:
|
| 19 |
+
- task:
|
| 20 |
+
type: text-generation
|
| 21 |
+
name: Text Generation
|
| 22 |
+
dataset:
|
| 23 |
+
name: AI2 Reasoning Challenge (25-Shot)
|
| 24 |
+
type: ai2_arc
|
| 25 |
+
config: ARC-Challenge
|
| 26 |
+
split: test
|
| 27 |
+
args:
|
| 28 |
+
num_few_shot: 25
|
| 29 |
+
metrics:
|
| 30 |
+
- type: acc_norm
|
| 31 |
+
value: 70.14
|
| 32 |
+
name: normalized accuracy
|
| 33 |
+
source:
|
| 34 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mistralai/Mixtral-8x7B-Instruct-v0.1
|
| 35 |
+
name: Open LLM Leaderboard
|
| 36 |
+
- task:
|
| 37 |
+
type: text-generation
|
| 38 |
+
name: Text Generation
|
| 39 |
+
dataset:
|
| 40 |
+
name: HellaSwag (10-Shot)
|
| 41 |
+
type: hellaswag
|
| 42 |
+
split: validation
|
| 43 |
+
args:
|
| 44 |
+
num_few_shot: 10
|
| 45 |
+
metrics:
|
| 46 |
+
- type: acc_norm
|
| 47 |
+
value: 87.55
|
| 48 |
+
name: normalized accuracy
|
| 49 |
+
source:
|
| 50 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mistralai/Mixtral-8x7B-Instruct-v0.1
|
| 51 |
+
name: Open LLM Leaderboard
|
| 52 |
+
- task:
|
| 53 |
+
type: text-generation
|
| 54 |
+
name: Text Generation
|
| 55 |
+
dataset:
|
| 56 |
+
name: MMLU (5-Shot)
|
| 57 |
+
type: cais/mmlu
|
| 58 |
+
config: all
|
| 59 |
+
split: test
|
| 60 |
+
args:
|
| 61 |
+
num_few_shot: 5
|
| 62 |
+
metrics:
|
| 63 |
+
- type: acc
|
| 64 |
+
value: 71.4
|
| 65 |
+
name: accuracy
|
| 66 |
+
source:
|
| 67 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mistralai/Mixtral-8x7B-Instruct-v0.1
|
| 68 |
+
name: Open LLM Leaderboard
|
| 69 |
+
- task:
|
| 70 |
+
type: text-generation
|
| 71 |
+
name: Text Generation
|
| 72 |
+
dataset:
|
| 73 |
+
name: TruthfulQA (0-shot)
|
| 74 |
+
type: truthful_qa
|
| 75 |
+
config: multiple_choice
|
| 76 |
+
split: validation
|
| 77 |
+
args:
|
| 78 |
+
num_few_shot: 0
|
| 79 |
+
metrics:
|
| 80 |
+
- type: mc2
|
| 81 |
+
value: 64.98
|
| 82 |
+
source:
|
| 83 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mistralai/Mixtral-8x7B-Instruct-v0.1
|
| 84 |
+
name: Open LLM Leaderboard
|
| 85 |
+
- task:
|
| 86 |
+
type: text-generation
|
| 87 |
+
name: Text Generation
|
| 88 |
+
dataset:
|
| 89 |
+
name: Winogrande (5-shot)
|
| 90 |
+
type: winogrande
|
| 91 |
+
config: winogrande_xl
|
| 92 |
+
split: validation
|
| 93 |
+
args:
|
| 94 |
+
num_few_shot: 5
|
| 95 |
+
metrics:
|
| 96 |
+
- type: acc
|
| 97 |
+
value: 81.06
|
| 98 |
+
name: accuracy
|
| 99 |
+
source:
|
| 100 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mistralai/Mixtral-8x7B-Instruct-v0.1
|
| 101 |
+
name: Open LLM Leaderboard
|
| 102 |
+
- task:
|
| 103 |
+
type: text-generation
|
| 104 |
+
name: Text Generation
|
| 105 |
+
dataset:
|
| 106 |
+
name: GSM8k (5-shot)
|
| 107 |
+
type: gsm8k
|
| 108 |
+
config: main
|
| 109 |
+
split: test
|
| 110 |
+
args:
|
| 111 |
+
num_few_shot: 5
|
| 112 |
+
metrics:
|
| 113 |
+
- type: acc
|
| 114 |
+
value: 61.11
|
| 115 |
+
name: accuracy
|
| 116 |
+
source:
|
| 117 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mistralai/Mixtral-8x7B-Instruct-v0.1
|
| 118 |
+
name: Open LLM Leaderboard
|
| 119 |
---
|
| 120 |
# Model Card for Mixtral-8x7B
|
| 121 |
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested.
|
|
|
|
| 267 |
make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
|
| 268 |
|
| 269 |
# The Mistral AI Team
|
| 270 |
+
Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
|
| 271 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
| 272 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mistralai__Mixtral-8x7B-Instruct-v0.1)
|
| 273 |
+
|
| 274 |
+
| Metric |Value|
|
| 275 |
+
|---------------------------------|----:|
|
| 276 |
+
|Avg. |72.70|
|
| 277 |
+
|AI2 Reasoning Challenge (25-Shot)|70.14|
|
| 278 |
+
|HellaSwag (10-Shot) |87.55|
|
| 279 |
+
|MMLU (5-Shot) |71.40|
|
| 280 |
+
|TruthfulQA (0-shot) |64.98|
|
| 281 |
+
|Winogrande (5-shot) |81.06|
|
| 282 |
+
|GSM8k (5-shot) |61.11|
|
| 283 |
+
|