Mohamed Sana
commited on
Commit
·
f497210
1
Parent(s):
5b22a44
remove math modeling
Browse files- app.py +2 -1
- src/about.py +2 -3
app.py
CHANGED
@@ -345,7 +345,8 @@ with demo:
|
|
345 |
elem_id="citation-button",
|
346 |
show_copy_button=True,
|
347 |
)
|
348 |
-
|
|
|
349 |
|
350 |
scheduler = BackgroundScheduler()
|
351 |
scheduler.add_job(restart_space, "interval", seconds=3600)
|
|
|
345 |
elem_id="citation-button",
|
346 |
show_copy_button=True,
|
347 |
)
|
348 |
+
|
349 |
+
# gr.HTML(BOTTOM_LOGO)
|
350 |
|
351 |
scheduler = BackgroundScheduler()
|
352 |
scheduler.add_job(restart_space, "interval", seconds=3600)
|
src/about.py
CHANGED
@@ -12,7 +12,7 @@ class Task:
|
|
12 |
class Tasks(Enum):
|
13 |
# # task_key in the json file, metric_key in the json file, name to display in the leaderboard
|
14 |
tsg_avg = Task("custom|3gpp:tsg|0", "em", "3GPP-TSG")
|
15 |
-
tele_EQ = Task("custom|telecom:math|0", "em", "TELE-EQ")
|
16 |
tele_QnA = Task("custom|telecom:qna|0", "em", "TELE-QnA")
|
17 |
|
18 |
|
@@ -47,7 +47,7 @@ LLM_BENCHMARKS_TEXT = f"""
|
|
47 |
|
48 |
Large Language Models (LLMs) have the potential to revolutionize the Sixth Generation (6G) communication networks. However, current mainstream LLMs generally lack the specialized
|
49 |
knowledge in telecom domain. In this paper, for the first time, we propose a pipeline to adapt any general purpose LLMs to a telecom-specific LLMs. We collect and build telecom-specific pretrain dataset, instruction dataset, preference dataset to perform
|
50 |
-
continual pre-training, instruct tuning and alignment tuning respectively. Besides, due to the lack of widely accepted evaluation benchmarks in telecom domain, we extend existing evaluation benchmarks and proposed
|
51 |
|
52 |
These new benchmarks provide a holistic evaluation of the capabilities of LLMs including math modeling, Open-Ended question answering, code generation, infilling, summarization and analysis in telecom domain.
|
53 |
|
@@ -74,7 +74,6 @@ Note 2 ⚠️ : Some models might be widely discussed as subjects of caution by
|
|
74 |
We have set up a benchmark using datasets:
|
75 |
- Telecom Math Modelling : Find more details [here](https://arxiv.org/pdf/2407.09424) - (provided by [TII](https://www.tii.ae/))
|
76 |
- Telecom Open QnA : Find more details [here](https://arxiv.org/abs/2310.15051) - (provided by [Huawei Technologies](https://huawei.com))
|
77 |
-
- Telecom Code Tasks : Find more details [here](https://arxiv.org/pdf/2407.09424) - (provided by [TII](https://www.tii.ae/))
|
78 |
|
79 |
To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
|
80 |
|
|
|
12 |
class Tasks(Enum):
|
13 |
# # task_key in the json file, metric_key in the json file, name to display in the leaderboard
|
14 |
tsg_avg = Task("custom|3gpp:tsg|0", "em", "3GPP-TSG")
|
15 |
+
# tele_EQ = Task("custom|telecom:math|0", "em", "TELE-EQ")
|
16 |
tele_QnA = Task("custom|telecom:qna|0", "em", "TELE-QnA")
|
17 |
|
18 |
|
|
|
47 |
|
48 |
Large Language Models (LLMs) have the potential to revolutionize the Sixth Generation (6G) communication networks. However, current mainstream LLMs generally lack the specialized
|
49 |
knowledge in telecom domain. In this paper, for the first time, we propose a pipeline to adapt any general purpose LLMs to a telecom-specific LLMs. We collect and build telecom-specific pretrain dataset, instruction dataset, preference dataset to perform
|
50 |
+
continual pre-training, instruct tuning and alignment tuning respectively. Besides, due to the lack of widely accepted evaluation benchmarks in telecom domain, we extend existing evaluation benchmarks and proposed two new benchmarks, namely, Telecom Open QnA and 3GPP technical group specification.
|
51 |
|
52 |
These new benchmarks provide a holistic evaluation of the capabilities of LLMs including math modeling, Open-Ended question answering, code generation, infilling, summarization and analysis in telecom domain.
|
53 |
|
|
|
74 |
We have set up a benchmark using datasets:
|
75 |
- Telecom Math Modelling : Find more details [here](https://arxiv.org/pdf/2407.09424) - (provided by [TII](https://www.tii.ae/))
|
76 |
- Telecom Open QnA : Find more details [here](https://arxiv.org/abs/2310.15051) - (provided by [Huawei Technologies](https://huawei.com))
|
|
|
77 |
|
78 |
To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
|
79 |
|