Spaces:
Runtime error
Runtime error
Update src/assets/text_content.py
Browse files- src/assets/text_content.py +11 -8
src/assets/text_content.py
CHANGED
|
@@ -8,19 +8,22 @@ INTRODUCTION_TEXT = f"""
|
|
| 8 |
The data on this page is sourced from a research paper. If you intend to use the data from this page, please remember to cite the following source: https://arxiv.org/abs/2303.07992
|
| 9 |
|
| 10 |
We compare the current SOTA traditional KBQA models (fine-tuned (FT) and zero-shot (ZS)),
|
|
|
|
| 11 |
LLMs in the GPT family, and Other Non-GPT LLM. In QALD-9 and LC-quad2, the evaluation metric used is F1, while other datasets use Accuracy (Exact match).
|
| 12 |
|
| 13 |
"""
|
| 14 |
|
| 15 |
LLM_BENCHMARKS_TEXT = f"""
|
| 16 |
-
ChatGPT is a powerful large language model (LLM) that
|
| 17 |
-
|
| 18 |
-
growing interest in exploring whether ChatGPT can replace traditional
|
| 19 |
-
|
| 20 |
-
have been some works analyzing the question answering performance of
|
| 21 |
-
|
| 22 |
-
In this paper, we present a framework that follows the black-box testing specifications of CheckList proposed by Microsoft.
|
| 23 |
-
|
|
|
|
|
|
|
| 24 |
The total number of test cases is approximately 190,000.
|
| 25 |
|
| 26 |
"""
|
|
|
|
| 8 |
The data on this page is sourced from a research paper. If you intend to use the data from this page, please remember to cite the following source: https://arxiv.org/abs/2303.07992
|
| 9 |
|
| 10 |
We compare the current SOTA traditional KBQA models (fine-tuned (FT) and zero-shot (ZS)),
|
| 11 |
+
|
| 12 |
LLMs in the GPT family, and Other Non-GPT LLM. In QALD-9 and LC-quad2, the evaluation metric used is F1, while other datasets use Accuracy (Exact match).
|
| 13 |
|
| 14 |
"""
|
| 15 |
|
| 16 |
LLM_BENCHMARKS_TEXT = f"""
|
| 17 |
+
ChatGPT is a powerful large language model (LLM) that covers knowledge resources such as Wikipedia and supports natural language question answering using its own knowledge.
|
| 18 |
+
|
| 19 |
+
Therefore, there is growing interest in exploring whether ChatGPT can replace traditional knowledge-based question answering (KBQA) models.
|
| 20 |
+
|
| 21 |
+
Although there have been some works analyzing the question answering performance of ChatGPT, there is still a lack of large-scale, comprehensive testing of various types of complex questions to analyze the limitations of the model.
|
| 22 |
+
|
| 23 |
+
In this paper, we present a framework that follows the black-box testing specifications of CheckList proposed by Microsoft.
|
| 24 |
+
|
| 25 |
+
We evaluate ChatGPT and its family of LLMs on eight real-world KB-based complex question answering datasets, which include six English datasets and two multilingual datasets.
|
| 26 |
+
|
| 27 |
The total number of test cases is approximately 190,000.
|
| 28 |
|
| 29 |
"""
|