Felladrin leaderboard-pr-bot commited on
Commit
02b57fe
·
verified ·
1 Parent(s): 43998dd

Adding Evaluation Results (#12)

Browse files

- Adding Evaluation Results (de00aaf7b17533211e2ff1b2062c248037754921)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +167 -50
README.md CHANGED
@@ -1,64 +1,167 @@
1
  ---
2
  language:
3
- - en
4
  license: apache-2.0
5
- datasets:
6
- - HuggingFaceH4/ultrachat_200k
7
- - Felladrin/ChatML-ultrachat_200k
8
- - Open-Orca/OpenOrca
9
- - Felladrin/ChatML-OpenOrca
10
- - hkust-nlp/deita-10k-v0
11
- - Felladrin/ChatML-deita-10k-v0
12
- - LDJnr/Capybara
13
- - Felladrin/ChatML-Capybara
14
- - databricks/databricks-dolly-15k
15
- - Felladrin/ChatML-databricks-dolly-15k
16
- - euclaise/reddit-instruct-curated
17
- - Felladrin/ChatML-reddit-instruct-curated
18
- - CohereForAI/aya_dataset
19
- - Felladrin/ChatML-aya_dataset
20
  base_model: Locutusque/TinyMistral-248M
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  pipeline_tag: text-generation
22
  widget:
23
- - messages:
24
- - role: system
25
- content:
26
- You are a highly knowledgeable and friendly assistant. Your goal is to
27
- understand and respond to user inquiries with clarity. Your interactions are
28
- always respectful, helpful, and focused on delivering the most accurate information
29
- to the user.
30
- - role: user
31
- content: Hey! Got a question for you!
32
- - role: assistant
33
- content: Sure! What's it?
34
- - role: user
35
- content: What are some potential applications for quantum computing?
36
- - messages:
37
- - role: user
38
- content: Heya!
39
- - role: assistant
40
- content: Hi! How may I help you?
41
- - role: user
42
- content:
43
- I'm interested in developing a career in software engineering. What
44
- would you recommend me to do?
45
- - messages:
46
- - role: user
47
- content: Morning!
48
- - role: assistant
49
- content: Good morning! How can I help you today?
50
- - role: user
51
- content: Could you give me some tips for becoming a healthier person?
52
- - messages:
53
- - role: system
54
- content: You are a very creative assistant. User will give you a task, which you should complete with all your knowledge.
55
- - role: user
56
- content: Hello! Can you please elaborate a background story of an RPG game about wizards and dragons in a sci-fi world?
57
  inference:
58
  parameters:
59
  max_new_tokens: 250
60
  penalty_alpha: 0.5
61
  top_k: 5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  ---
63
 
64
  # Locutusque's TinyMistral-248M trained on chat datasets
@@ -147,3 +250,17 @@ This model was trained with [SFTTrainer](https://huggingface.co/docs/trl/main/en
147
  | Optimizer | Adam with betas=(0.9,0.999) and epsilon=1e-08 |
148
  | Scheduler | cosine |
149
  | Seed | 42 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
+ - en
4
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  base_model: Locutusque/TinyMistral-248M
6
+ datasets:
7
+ - HuggingFaceH4/ultrachat_200k
8
+ - Felladrin/ChatML-ultrachat_200k
9
+ - Open-Orca/OpenOrca
10
+ - Felladrin/ChatML-OpenOrca
11
+ - hkust-nlp/deita-10k-v0
12
+ - Felladrin/ChatML-deita-10k-v0
13
+ - LDJnr/Capybara
14
+ - Felladrin/ChatML-Capybara
15
+ - databricks/databricks-dolly-15k
16
+ - Felladrin/ChatML-databricks-dolly-15k
17
+ - euclaise/reddit-instruct-curated
18
+ - Felladrin/ChatML-reddit-instruct-curated
19
+ - CohereForAI/aya_dataset
20
+ - Felladrin/ChatML-aya_dataset
21
  pipeline_tag: text-generation
22
  widget:
23
+ - messages:
24
+ - role: system
25
+ content: You are a highly knowledgeable and friendly assistant. Your goal is to
26
+ understand and respond to user inquiries with clarity. Your interactions are
27
+ always respectful, helpful, and focused on delivering the most accurate information
28
+ to the user.
29
+ - role: user
30
+ content: Hey! Got a question for you!
31
+ - role: assistant
32
+ content: Sure! What's it?
33
+ - role: user
34
+ content: What are some potential applications for quantum computing?
35
+ - messages:
36
+ - role: user
37
+ content: Heya!
38
+ - role: assistant
39
+ content: Hi! How may I help you?
40
+ - role: user
41
+ content: I'm interested in developing a career in software engineering. What would
42
+ you recommend me to do?
43
+ - messages:
44
+ - role: user
45
+ content: Morning!
46
+ - role: assistant
47
+ content: Good morning! How can I help you today?
48
+ - role: user
49
+ content: Could you give me some tips for becoming a healthier person?
50
+ - messages:
51
+ - role: system
52
+ content: You are a very creative assistant. User will give you a task, which you
53
+ should complete with all your knowledge.
54
+ - role: user
55
+ content: Hello! Can you please elaborate a background story of an RPG game about
56
+ wizards and dragons in a sci-fi world?
57
  inference:
58
  parameters:
59
  max_new_tokens: 250
60
  penalty_alpha: 0.5
61
  top_k: 5
62
+ model-index:
63
+ - name: TinyMistral-248M-Chat-v2
64
+ results:
65
+ - task:
66
+ type: text-generation
67
+ name: Text Generation
68
+ dataset:
69
+ name: AI2 Reasoning Challenge (25-Shot)
70
+ type: ai2_arc
71
+ config: ARC-Challenge
72
+ split: test
73
+ args:
74
+ num_few_shot: 25
75
+ metrics:
76
+ - type: acc_norm
77
+ value: 23.29
78
+ name: normalized accuracy
79
+ source:
80
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-Chat-v2
81
+ name: Open LLM Leaderboard
82
+ - task:
83
+ type: text-generation
84
+ name: Text Generation
85
+ dataset:
86
+ name: HellaSwag (10-Shot)
87
+ type: hellaswag
88
+ split: validation
89
+ args:
90
+ num_few_shot: 10
91
+ metrics:
92
+ - type: acc_norm
93
+ value: 27.39
94
+ name: normalized accuracy
95
+ source:
96
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-Chat-v2
97
+ name: Open LLM Leaderboard
98
+ - task:
99
+ type: text-generation
100
+ name: Text Generation
101
+ dataset:
102
+ name: MMLU (5-Shot)
103
+ type: cais/mmlu
104
+ config: all
105
+ split: test
106
+ args:
107
+ num_few_shot: 5
108
+ metrics:
109
+ - type: acc
110
+ value: 23.52
111
+ name: accuracy
112
+ source:
113
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-Chat-v2
114
+ name: Open LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: TruthfulQA (0-shot)
120
+ type: truthful_qa
121
+ config: multiple_choice
122
+ split: validation
123
+ args:
124
+ num_few_shot: 0
125
+ metrics:
126
+ - type: mc2
127
+ value: 41.32
128
+ source:
129
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-Chat-v2
130
+ name: Open LLM Leaderboard
131
+ - task:
132
+ type: text-generation
133
+ name: Text Generation
134
+ dataset:
135
+ name: Winogrande (5-shot)
136
+ type: winogrande
137
+ config: winogrande_xl
138
+ split: validation
139
+ args:
140
+ num_few_shot: 5
141
+ metrics:
142
+ - type: acc
143
+ value: 49.01
144
+ name: accuracy
145
+ source:
146
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-Chat-v2
147
+ name: Open LLM Leaderboard
148
+ - task:
149
+ type: text-generation
150
+ name: Text Generation
151
+ dataset:
152
+ name: GSM8k (5-shot)
153
+ type: gsm8k
154
+ config: main
155
+ split: test
156
+ args:
157
+ num_few_shot: 5
158
+ metrics:
159
+ - type: acc
160
+ value: 0.0
161
+ name: accuracy
162
+ source:
163
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-Chat-v2
164
+ name: Open LLM Leaderboard
165
  ---
166
 
167
  # Locutusque's TinyMistral-248M trained on chat datasets
 
250
  | Optimizer | Adam with betas=(0.9,0.999) and epsilon=1e-08 |
251
  | Scheduler | cosine |
252
  | Seed | 42 |
253
+
254
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
255
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__TinyMistral-248M-Chat-v2)
256
+
257
+ | Metric |Value|
258
+ |---------------------------------|----:|
259
+ |Avg. |27.42|
260
+ |AI2 Reasoning Challenge (25-Shot)|23.29|
261
+ |HellaSwag (10-Shot) |27.39|
262
+ |MMLU (5-Shot) |23.52|
263
+ |TruthfulQA (0-shot) |41.32|
264
+ |Winogrande (5-shot) |49.01|
265
+ |GSM8k (5-shot) | 0.00|
266
+