future-xy
commited on
Commit
β’
1118802
1
Parent(s):
d6d7ec6
delete hallucination blog
Browse files
blog/Hallucination-Leaderboard-Summary.csv
CHANGED
@@ -1,20 +1 @@
|
|
1 |
Category,Benchmark,Dataset Link,Data Split,Data Size,Language
|
2 |
-
Closed-book Open-domain QA ,NQ Open (64-shot),https://huggingface.co/datasets/nq_open/viewer/nq_open/validation,validation,3.61k,En
|
3 |
-
Closed-book Open-domain QA ,NQ Open (8-shot),https://huggingface.co/datasets/nq_open/viewer/nq_open/validation,validation,3.61k,En
|
4 |
-
Closed-book Open-domain QA ,TriviaQA (64-shot),https://huggingface.co/datasets/trivia_qa/viewer/rc.nocontext/test,test,17.2k,En
|
5 |
-
Closed-book Open-domain QA ,TriviaQA 8 (8-shot),https://huggingface.co/datasets/trivia_qa/viewer/rc.nocontext/test,test,17.2k,En
|
6 |
-
Closed-book Open-domain QA ,TruthfulQA MC1 (0-shot),https://huggingface.co/datasets/truthful_qa/viewer/multiple_choice,mc1_targets column,0.8k,En
|
7 |
-
Closed-book Open-domain QA ,TruthfulQA MC2 (0-shot),https://huggingface.co/datasets/truthful_qa/viewer/multiple_choice,mc2_targets column,0.8k,En
|
8 |
-
Fact-Checking,FEVER (16-shot),https://huggingface.co/datasets/fever/viewer/v1.0/labelled_dev,labeld_dev,37.6k,En
|
9 |
-
Hallucination Detection,FaithDial (8-shot),https://huggingface.co/datasets/McGill-NLP/FaithDial,test,3.54k,En
|
10 |
-
Hallucination Detection,HaluEval QA (0-shot),https://huggingface.co/datasets/pminervini/HaluEval/viewer/qa_samples,qa_samples,10k,En
|
11 |
-
Hallucination Detection,HaluEval Summ (0-shot),https://huggingface.co/datasets/pminervini/HaluEval/viewer/summarization_samples,summarization_samples,10k,En
|
12 |
-
Hallucination Detection,HaluEval Dial (0-shot),https://huggingface.co/datasets/pminervini/HaluEval/viewer/dialogue_samples,dialogue_samples,10k,En
|
13 |
-
Hallucination Detection,TrueFalse (8-shot),https://huggingface.co/datasets/pminervini/true-false/viewer/default/cieacf,cieacf,6.09k,En
|
14 |
-
Instruction Following,MemoTrap (0-shot),https://huggingface.co/datasets/pminervini/inverse-scaling/viewer/memo-trap,memo-trap,0.9k,En
|
15 |
-
Instruction Following,IFEval (0-shot),https://huggingface.co/datasets/wis-k/instruction-following-eval,train,0.5k,En
|
16 |
-
Reading Comprehension,SQuADv2 (4-shot),https://huggingface.co/datasets/squad_v2/viewer/squad_v2/validation,validation,11.9k,En
|
17 |
-
Reading Comprehension,RACE (0-shot),https://huggingface.co/datasets/EleutherAI/race,test,1.05k,En
|
18 |
-
Self-Consistency,SelfCheckGPT (0-shot),https://huggingface.co/datasets/potsawee/wiki_bio_gpt3_hallucination,validation,0.2k,En
|
19 |
-
Summarisation,XSum (2-shot),https://huggingface.co/datasets/EdinburghNLP/xsum/viewer/default/test,test,11.3k,En
|
20 |
-
Summarisation,CNN/DM (2-shot),https://huggingface.co/datasets/cnn_dailymail/viewer/3.0.0/test,test,11.5k,En
|
|
|
1 |
Category,Benchmark,Dataset Link,Data Split,Data Size,Language
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/display/about.py
CHANGED
@@ -3,7 +3,7 @@ from src.display.utils import ModelType
|
|
3 |
TITLE = """<h1 align="center" id="space-title">MOE LLM GPU-Poor Leaderboard</h1>"""
|
4 |
|
5 |
INTRODUCTION_TEXT = """
|
6 |
-
π The MOE LLM GPU-Poor Leaderboard aims to
|
7 |
|
8 |
|
9 |
"""
|
|
|
3 |
TITLE = """<h1 align="center" id="space-title">MOE LLM GPU-Poor Leaderboard</h1>"""
|
4 |
|
5 |
INTRODUCTION_TEXT = """
|
6 |
+
π The MOE LLM GPU-Poor Leaderboard aims to evaluate LLMs.
|
7 |
|
8 |
|
9 |
"""
|