Adding Evaluation Results

#168
Files changed (1) hide show
  1. README.md +118 -2
README.md CHANGED
@@ -1,11 +1,11 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - fr
5
  - it
6
  - de
7
  - es
8
  - en
 
9
  inference:
10
  parameters:
11
  temperature: 0.5
@@ -13,6 +13,109 @@ widget:
13
  - messages:
14
  - role: user
15
  content: What is your favorite condiment?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
  # Model Card for Mixtral-8x7B
18
  The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested.
@@ -164,4 +267,17 @@ It does not have any moderation mechanisms. We're looking forward to engaging wi
164
  make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
165
 
166
  # The Mistral AI Team
167
- Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - fr
4
  - it
5
  - de
6
  - es
7
  - en
8
+ license: apache-2.0
9
  inference:
10
  parameters:
11
  temperature: 0.5
 
13
  - messages:
14
  - role: user
15
  content: What is your favorite condiment?
16
+ model-index:
17
+ - name: Mixtral-8x7B-Instruct-v0.1
18
+ results:
19
+ - task:
20
+ type: text-generation
21
+ name: Text Generation
22
+ dataset:
23
+ name: AI2 Reasoning Challenge (25-Shot)
24
+ type: ai2_arc
25
+ config: ARC-Challenge
26
+ split: test
27
+ args:
28
+ num_few_shot: 25
29
+ metrics:
30
+ - type: acc_norm
31
+ value: 70.14
32
+ name: normalized accuracy
33
+ source:
34
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mistralai/Mixtral-8x7B-Instruct-v0.1
35
+ name: Open LLM Leaderboard
36
+ - task:
37
+ type: text-generation
38
+ name: Text Generation
39
+ dataset:
40
+ name: HellaSwag (10-Shot)
41
+ type: hellaswag
42
+ split: validation
43
+ args:
44
+ num_few_shot: 10
45
+ metrics:
46
+ - type: acc_norm
47
+ value: 87.55
48
+ name: normalized accuracy
49
+ source:
50
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mistralai/Mixtral-8x7B-Instruct-v0.1
51
+ name: Open LLM Leaderboard
52
+ - task:
53
+ type: text-generation
54
+ name: Text Generation
55
+ dataset:
56
+ name: MMLU (5-Shot)
57
+ type: cais/mmlu
58
+ config: all
59
+ split: test
60
+ args:
61
+ num_few_shot: 5
62
+ metrics:
63
+ - type: acc
64
+ value: 71.4
65
+ name: accuracy
66
+ source:
67
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mistralai/Mixtral-8x7B-Instruct-v0.1
68
+ name: Open LLM Leaderboard
69
+ - task:
70
+ type: text-generation
71
+ name: Text Generation
72
+ dataset:
73
+ name: TruthfulQA (0-shot)
74
+ type: truthful_qa
75
+ config: multiple_choice
76
+ split: validation
77
+ args:
78
+ num_few_shot: 0
79
+ metrics:
80
+ - type: mc2
81
+ value: 64.98
82
+ source:
83
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mistralai/Mixtral-8x7B-Instruct-v0.1
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: Winogrande (5-shot)
90
+ type: winogrande
91
+ config: winogrande_xl
92
+ split: validation
93
+ args:
94
+ num_few_shot: 5
95
+ metrics:
96
+ - type: acc
97
+ value: 81.06
98
+ name: accuracy
99
+ source:
100
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mistralai/Mixtral-8x7B-Instruct-v0.1
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: GSM8k (5-shot)
107
+ type: gsm8k
108
+ config: main
109
+ split: test
110
+ args:
111
+ num_few_shot: 5
112
+ metrics:
113
+ - type: acc
114
+ value: 61.11
115
+ name: accuracy
116
+ source:
117
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mistralai/Mixtral-8x7B-Instruct-v0.1
118
+ name: Open LLM Leaderboard
119
  ---
120
  # Model Card for Mixtral-8x7B
121
  The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested.
 
267
  make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
268
 
269
  # The Mistral AI Team
270
+ Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
271
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
272
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mistralai__Mixtral-8x7B-Instruct-v0.1)
273
+
274
+ | Metric |Value|
275
+ |---------------------------------|----:|
276
+ |Avg. |72.70|
277
+ |AI2 Reasoning Challenge (25-Shot)|70.14|
278
+ |HellaSwag (10-Shot) |87.55|
279
+ |MMLU (5-Shot) |71.40|
280
+ |TruthfulQA (0-shot) |64.98|
281
+ |Winogrande (5-shot) |81.06|
282
+ |GSM8k (5-shot) |61.11|
283
+