Spaces:

otellm
/

open-telecom-llm-leaderboard

Running

App Files Files Community

Mohamed Sana commited on Nov 7, 2024

Commit

f497210

1 Parent(s): 5b22a44

remove math modeling

Browse files

Files changed (2) hide show

app.py +2 -1
src/about.py +2 -3

app.py CHANGED Viewed

@@ -345,7 +345,8 @@ with demo:
                 elem_id="citation-button",
                 show_copy_button=True,
             )
-    gr.HTML(BOTTOM_LOGO)
 scheduler = BackgroundScheduler()
 scheduler.add_job(restart_space, "interval", seconds=3600)

                 elem_id="citation-button",
                 show_copy_button=True,
             )
+    # gr.HTML(BOTTOM_LOGO)
 scheduler = BackgroundScheduler()
 scheduler.add_job(restart_space, "interval", seconds=3600)

src/about.py CHANGED Viewed

@@ -12,7 +12,7 @@ class Task:
 class Tasks(Enum):
     # # task_key in the json file, metric_key in the json file, name to display in the leaderboard
     tsg_avg = Task("custom|3gpp:tsg|0", "em", "3GPP-TSG")
-    tele_EQ = Task("custom|telecom:math|0", "em", "TELE-EQ")
     tele_QnA = Task("custom|telecom:qna|0", "em", "TELE-QnA")
@@ -47,7 +47,7 @@ LLM_BENCHMARKS_TEXT = f"""
 Large Language Models (LLMs) have the potential to revolutionize the Sixth Generation (6G) communication networks. However, current mainstream LLMs generally lack the specialized
 knowledge in telecom domain. In this paper, for the first time, we propose a pipeline to adapt any general purpose LLMs to a telecom-specific LLMs. We collect and build telecom-specific pretrain dataset, instruction dataset, preference dataset to perform
-continual pre-training, instruct tuning and alignment tuning respectively. Besides, due to the lack of widely accepted evaluation benchmarks in telecom domain, we extend existing evaluation benchmarks and proposed three new benchmarks, namely, Telecom Math Modeling, Telecom Open QnA and Telecom Code Tasks.
 These new benchmarks provide a holistic evaluation of the capabilities of LLMs including math modeling, Open-Ended question answering, code generation, infilling, summarization and analysis in telecom domain.
@@ -74,7 +74,6 @@ Note 2 ⚠️ : Some models might be widely discussed as subjects of caution by
 We have set up a benchmark using datasets:
 - Telecom Math Modelling : Find more details [here](https://arxiv.org/pdf/2407.09424) - (provided by [TII](https://www.tii.ae/))
 - Telecom Open QnA : Find more details [here](https://arxiv.org/abs/2310.15051) - (provided by [Huawei Technologies](https://huawei.com))
-- Telecom Code Tasks : Find more details [here](https://arxiv.org/pdf/2407.09424) - (provided by [TII](https://www.tii.ae/))
 To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.

 class Tasks(Enum):
     # # task_key in the json file, metric_key in the json file, name to display in the leaderboard
     tsg_avg = Task("custom|3gpp:tsg|0", "em", "3GPP-TSG")
+    # tele_EQ = Task("custom|telecom:math|0", "em", "TELE-EQ")
     tele_QnA = Task("custom|telecom:qna|0", "em", "TELE-QnA")
 Large Language Models (LLMs) have the potential to revolutionize the Sixth Generation (6G) communication networks. However, current mainstream LLMs generally lack the specialized
 knowledge in telecom domain. In this paper, for the first time, we propose a pipeline to adapt any general purpose LLMs to a telecom-specific LLMs. We collect and build telecom-specific pretrain dataset, instruction dataset, preference dataset to perform
+continual pre-training, instruct tuning and alignment tuning respectively. Besides, due to the lack of widely accepted evaluation benchmarks in telecom domain, we extend existing evaluation benchmarks and proposed two new benchmarks, namely, Telecom Open QnA and 3GPP technical group specification.
 These new benchmarks provide a holistic evaluation of the capabilities of LLMs including math modeling, Open-Ended question answering, code generation, infilling, summarization and analysis in telecom domain.
 We have set up a benchmark using datasets:
 - Telecom Math Modelling : Find more details [here](https://arxiv.org/pdf/2407.09424) - (provided by [TII](https://www.tii.ae/))
 - Telecom Open QnA : Find more details [here](https://arxiv.org/abs/2310.15051) - (provided by [Huawei Technologies](https://huawei.com))
 To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.