LeroyDyer commited on
Commit
68280d5
·
verified ·
1 Parent(s): 0eaabed

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +111 -3
README.md CHANGED
@@ -1,9 +1,8 @@
1
  ---
2
- license: mit
3
  language:
4
  - en
 
5
  library_name: transformers
6
- pipeline_tag: text2text-generation
7
  tags:
8
  - LCARS
9
  - Star-Trek
@@ -17,6 +16,102 @@ tags:
17
  - code
18
  - medical
19
  - text-generation-inference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ---
21
  If anybody has star trek data please send as this starship computer database archive needs it!
22
 
@@ -34,4 +129,17 @@ So those models were merged into other models which had been specifically traine
34
  the models were heavliy dpo trained ; and various newer methodologies installed : the deep mind series is a special series which contains self correction recal, visio spacial ... step by step thinking:
35
 
36
  SO the multi merge often fizes these errors between models as well as training gaps :Hopefully they all took and merged well !
37
- Performing even unknown and unprogrammed tasks:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ license: mit
5
  library_name: transformers
 
6
  tags:
7
  - LCARS
8
  - Star-Trek
 
16
  - code
17
  - medical
18
  - text-generation-inference
19
+ pipeline_tag: text2text-generation
20
+ model-index:
21
+ - name: LCARS_AI_StarTrek_Computer
22
+ results:
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: IFEval (0-Shot)
28
+ type: HuggingFaceH4/ifeval
29
+ args:
30
+ num_few_shot: 0
31
+ metrics:
32
+ - type: inst_level_strict_acc and prompt_level_strict_acc
33
+ value: 35.83
34
+ name: strict accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/LCARS_AI_StarTrek_Computer
37
+ name: Open LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: BBH (3-Shot)
43
+ type: BBH
44
+ args:
45
+ num_few_shot: 3
46
+ metrics:
47
+ - type: acc_norm
48
+ value: 21.78
49
+ name: normalized accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/LCARS_AI_StarTrek_Computer
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: MATH Lvl 5 (4-Shot)
58
+ type: hendrycks/competition_math
59
+ args:
60
+ num_few_shot: 4
61
+ metrics:
62
+ - type: exact_match
63
+ value: 4.08
64
+ name: exact match
65
+ source:
66
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/LCARS_AI_StarTrek_Computer
67
+ name: Open LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: GPQA (0-shot)
73
+ type: Idavidrein/gpqa
74
+ args:
75
+ num_few_shot: 0
76
+ metrics:
77
+ - type: acc_norm
78
+ value: 2.35
79
+ name: acc_norm
80
+ source:
81
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/LCARS_AI_StarTrek_Computer
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: MuSR (0-shot)
88
+ type: TAUR-Lab/MuSR
89
+ args:
90
+ num_few_shot: 0
91
+ metrics:
92
+ - type: acc_norm
93
+ value: 7.44
94
+ name: acc_norm
95
+ source:
96
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/LCARS_AI_StarTrek_Computer
97
+ name: Open LLM Leaderboard
98
+ - task:
99
+ type: text-generation
100
+ name: Text Generation
101
+ dataset:
102
+ name: MMLU-PRO (5-shot)
103
+ type: TIGER-Lab/MMLU-Pro
104
+ config: main
105
+ split: test
106
+ args:
107
+ num_few_shot: 5
108
+ metrics:
109
+ - type: acc
110
+ value: 16.2
111
+ name: accuracy
112
+ source:
113
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/LCARS_AI_StarTrek_Computer
114
+ name: Open LLM Leaderboard
115
  ---
116
  If anybody has star trek data please send as this starship computer database archive needs it!
117
 
 
129
  the models were heavliy dpo trained ; and various newer methodologies installed : the deep mind series is a special series which contains self correction recal, visio spacial ... step by step thinking:
130
 
131
  SO the multi merge often fizes these errors between models as well as training gaps :Hopefully they all took and merged well !
132
+ Performing even unknown and unprogrammed tasks:
133
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
134
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/LeroyDyer__LCARS_AI_StarTrek_Computer-details)
135
+
136
+ | Metric |Value|
137
+ |-------------------|----:|
138
+ |Avg. |14.61|
139
+ |IFEval (0-Shot) |35.83|
140
+ |BBH (3-Shot) |21.78|
141
+ |MATH Lvl 5 (4-Shot)| 4.08|
142
+ |GPQA (0-shot) | 2.35|
143
+ |MuSR (0-shot) | 7.44|
144
+ |MMLU-PRO (5-shot) |16.20|
145
+