Novaciano commited on
Commit
82258d4
·
verified ·
1 Parent(s): ee2ad85

Adding Evaluation Results

Browse files

This is an automated PR created with [this space](https://huggingface.co/spaces/T145/open-llm-leaderboard-results-to-modelcard)!

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

Please report any issues here: https://huggingface.co/spaces/T145/open-llm-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -38,6 +38,105 @@ language:
38
  - en
39
  license: apache-2.0
40
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ---
42
 
43
  <center><a href="https://ibb.co/YFCsj2MK"><img src="https://i.ibb.co/pB7FX28s/1559d4be98b5a26edf62ee40695ececc-high.jpg" alt="1559d4be98b5a26edf62ee40695ececc-high" border="0"></a></center>
@@ -85,4 +184,18 @@ base_model: bunnycore/FuseChat-3.2-1B-Creative-RP
85
  dtype: bfloat16
86
  parameters:
87
  t: [0, 0.5, 1, 0.5, 0]
88
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  - en
39
  license: apache-2.0
40
  pipeline_tag: text-generation
41
+ model-index:
42
+ - name: La_Mejor_Mezcla-3.2-1B
43
+ results:
44
+ - task:
45
+ type: text-generation
46
+ name: Text Generation
47
+ dataset:
48
+ name: IFEval (0-Shot)
49
+ type: wis-k/instruction-following-eval
50
+ split: train
51
+ args:
52
+ num_few_shot: 0
53
+ metrics:
54
+ - type: inst_level_strict_acc and prompt_level_strict_acc
55
+ value: 55.1
56
+ name: averaged accuracy
57
+ source:
58
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FLa_Mejor_Mezcla-3.2-1B
59
+ name: Open LLM Leaderboard
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: BBH (3-Shot)
65
+ type: SaylorTwift/bbh
66
+ split: test
67
+ args:
68
+ num_few_shot: 3
69
+ metrics:
70
+ - type: acc_norm
71
+ value: 9.41
72
+ name: normalized accuracy
73
+ source:
74
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FLa_Mejor_Mezcla-3.2-1B
75
+ name: Open LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: MATH Lvl 5 (4-Shot)
81
+ type: lighteval/MATH-Hard
82
+ split: test
83
+ args:
84
+ num_few_shot: 4
85
+ metrics:
86
+ - type: exact_match
87
+ value: 8.99
88
+ name: exact match
89
+ source:
90
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FLa_Mejor_Mezcla-3.2-1B
91
+ name: Open LLM Leaderboard
92
+ - task:
93
+ type: text-generation
94
+ name: Text Generation
95
+ dataset:
96
+ name: GPQA (0-shot)
97
+ type: Idavidrein/gpqa
98
+ split: train
99
+ args:
100
+ num_few_shot: 0
101
+ metrics:
102
+ - type: acc_norm
103
+ value: 1.01
104
+ name: acc_norm
105
+ source:
106
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FLa_Mejor_Mezcla-3.2-1B
107
+ name: Open LLM Leaderboard
108
+ - task:
109
+ type: text-generation
110
+ name: Text Generation
111
+ dataset:
112
+ name: MuSR (0-shot)
113
+ type: TAUR-Lab/MuSR
114
+ args:
115
+ num_few_shot: 0
116
+ metrics:
117
+ - type: acc_norm
118
+ value: 0.62
119
+ name: acc_norm
120
+ source:
121
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FLa_Mejor_Mezcla-3.2-1B
122
+ name: Open LLM Leaderboard
123
+ - task:
124
+ type: text-generation
125
+ name: Text Generation
126
+ dataset:
127
+ name: MMLU-PRO (5-shot)
128
+ type: TIGER-Lab/MMLU-Pro
129
+ config: main
130
+ split: test
131
+ args:
132
+ num_few_shot: 5
133
+ metrics:
134
+ - type: acc
135
+ value: 9.21
136
+ name: accuracy
137
+ source:
138
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FLa_Mejor_Mezcla-3.2-1B
139
+ name: Open LLM Leaderboard
140
  ---
141
 
142
  <center><a href="https://ibb.co/YFCsj2MK"><img src="https://i.ibb.co/pB7FX28s/1559d4be98b5a26edf62ee40695ececc-high.jpg" alt="1559d4be98b5a26edf62ee40695ececc-high" border="0"></a></center>
 
184
  dtype: bfloat16
185
  parameters:
186
  t: [0, 0.5, 1, 0.5, 0]
187
+ ```
188
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
189
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Novaciano__La_Mejor_Mezcla-3.2-1B-details)!
190
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Novaciano%2FLa_Mejor_Mezcla-3.2-1B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
191
+
192
+ | Metric |Value (%)|
193
+ |-------------------|--------:|
194
+ |**Average** | 14.06|
195
+ |IFEval (0-Shot) | 55.10|
196
+ |BBH (3-Shot) | 9.41|
197
+ |MATH Lvl 5 (4-Shot)| 8.99|
198
+ |GPQA (0-shot) | 1.01|
199
+ |MuSR (0-shot) | 0.62|
200
+ |MMLU-PRO (5-shot) | 9.21|
201
+