Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (c97cefbd6ea95d3bf28c28ecd42c5d69fa2c5cfc)

Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show

README.md +109 -0

README.md CHANGED Viewed

@@ -1,5 +1,100 @@
 ---
 license: apache-2.0
 ---
 ### 1. Model Details
@@ -25,3 +120,17 @@ model.generation_config.pad_token_id = model.generation_config.eos_token_id
 ```
 ### 3 Disclaimer
 The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.

 ---
 license: apache-2.0
+model-index:
+- name: Llama3.1_CoT_V1
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: IFEval (0-Shot)
+      type: HuggingFaceH4/ifeval
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: inst_level_strict_acc and prompt_level_strict_acc
+      value: 24.53
+      name: strict accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=xinchen9/Llama3.1_CoT_V1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: BBH (3-Shot)
+      type: BBH
+      args:
+        num_few_shot: 3
+    metrics:
+    - type: acc_norm
+      value: 20.17
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=xinchen9/Llama3.1_CoT_V1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MATH Lvl 5 (4-Shot)
+      type: hendrycks/competition_math
+      args:
+        num_few_shot: 4
+    metrics:
+    - type: exact_match
+      value: 1.21
+      name: exact match
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=xinchen9/Llama3.1_CoT_V1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GPQA (0-shot)
+      type: Idavidrein/gpqa
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: acc_norm
+      value: 3.91
+      name: acc_norm
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=xinchen9/Llama3.1_CoT_V1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MuSR (0-shot)
+      type: TAUR-Lab/MuSR
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: acc_norm
+      value: 16.42
+      name: acc_norm
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=xinchen9/Llama3.1_CoT_V1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU-PRO (5-shot)
+      type: TIGER-Lab/MMLU-Pro
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 20.06
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=xinchen9/Llama3.1_CoT_V1
+      name: Open LLM Leaderboard
 ---
 ### 1. Model Details
 ```
 ### 3 Disclaimer
 The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_xinchen9__Llama3.1_CoT_V1)
+|      Metric       |Value|
+|-------------------|----:|
+|Avg.               |14.38|
+|IFEval (0-Shot)    |24.53|
+|BBH (3-Shot)       |20.17|
+|MATH Lvl 5 (4-Shot)| 1.21|
+|GPQA (0-shot)      | 3.91|
+|MuSR (0-shot)      |16.42|
+|MMLU-PRO (5-shot)  |20.06|