Mohamed Sana commited on
Commit
f497210
·
1 Parent(s): 5b22a44

remove math modeling

Browse files
Files changed (2) hide show
  1. app.py +2 -1
  2. src/about.py +2 -3
app.py CHANGED
@@ -345,7 +345,8 @@ with demo:
345
  elem_id="citation-button",
346
  show_copy_button=True,
347
  )
348
- gr.HTML(BOTTOM_LOGO)
 
349
 
350
  scheduler = BackgroundScheduler()
351
  scheduler.add_job(restart_space, "interval", seconds=3600)
 
345
  elem_id="citation-button",
346
  show_copy_button=True,
347
  )
348
+
349
+ # gr.HTML(BOTTOM_LOGO)
350
 
351
  scheduler = BackgroundScheduler()
352
  scheduler.add_job(restart_space, "interval", seconds=3600)
src/about.py CHANGED
@@ -12,7 +12,7 @@ class Task:
12
  class Tasks(Enum):
13
  # # task_key in the json file, metric_key in the json file, name to display in the leaderboard
14
  tsg_avg = Task("custom|3gpp:tsg|0", "em", "3GPP-TSG")
15
- tele_EQ = Task("custom|telecom:math|0", "em", "TELE-EQ")
16
  tele_QnA = Task("custom|telecom:qna|0", "em", "TELE-QnA")
17
 
18
 
@@ -47,7 +47,7 @@ LLM_BENCHMARKS_TEXT = f"""
47
 
48
  Large Language Models (LLMs) have the potential to revolutionize the Sixth Generation (6G) communication networks. However, current mainstream LLMs generally lack the specialized
49
  knowledge in telecom domain. In this paper, for the first time, we propose a pipeline to adapt any general purpose LLMs to a telecom-specific LLMs. We collect and build telecom-specific pretrain dataset, instruction dataset, preference dataset to perform
50
- continual pre-training, instruct tuning and alignment tuning respectively. Besides, due to the lack of widely accepted evaluation benchmarks in telecom domain, we extend existing evaluation benchmarks and proposed three new benchmarks, namely, Telecom Math Modeling, Telecom Open QnA and Telecom Code Tasks.
51
 
52
  These new benchmarks provide a holistic evaluation of the capabilities of LLMs including math modeling, Open-Ended question answering, code generation, infilling, summarization and analysis in telecom domain.
53
 
@@ -74,7 +74,6 @@ Note 2 ⚠️ : Some models might be widely discussed as subjects of caution by
74
  We have set up a benchmark using datasets:
75
  - Telecom Math Modelling : Find more details [here](https://arxiv.org/pdf/2407.09424) - (provided by [TII](https://www.tii.ae/))
76
  - Telecom Open QnA : Find more details [here](https://arxiv.org/abs/2310.15051) - (provided by [Huawei Technologies](https://huawei.com))
77
- - Telecom Code Tasks : Find more details [here](https://arxiv.org/pdf/2407.09424) - (provided by [TII](https://www.tii.ae/))
78
 
79
  To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
80
 
 
12
  class Tasks(Enum):
13
  # # task_key in the json file, metric_key in the json file, name to display in the leaderboard
14
  tsg_avg = Task("custom|3gpp:tsg|0", "em", "3GPP-TSG")
15
+ # tele_EQ = Task("custom|telecom:math|0", "em", "TELE-EQ")
16
  tele_QnA = Task("custom|telecom:qna|0", "em", "TELE-QnA")
17
 
18
 
 
47
 
48
  Large Language Models (LLMs) have the potential to revolutionize the Sixth Generation (6G) communication networks. However, current mainstream LLMs generally lack the specialized
49
  knowledge in telecom domain. In this paper, for the first time, we propose a pipeline to adapt any general purpose LLMs to a telecom-specific LLMs. We collect and build telecom-specific pretrain dataset, instruction dataset, preference dataset to perform
50
+ continual pre-training, instruct tuning and alignment tuning respectively. Besides, due to the lack of widely accepted evaluation benchmarks in telecom domain, we extend existing evaluation benchmarks and proposed two new benchmarks, namely, Telecom Open QnA and 3GPP technical group specification.
51
 
52
  These new benchmarks provide a holistic evaluation of the capabilities of LLMs including math modeling, Open-Ended question answering, code generation, infilling, summarization and analysis in telecom domain.
53
 
 
74
  We have set up a benchmark using datasets:
75
  - Telecom Math Modelling : Find more details [here](https://arxiv.org/pdf/2407.09424) - (provided by [TII](https://www.tii.ae/))
76
  - Telecom Open QnA : Find more details [here](https://arxiv.org/abs/2310.15051) - (provided by [Huawei Technologies](https://huawei.com))
 
77
 
78
  To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
79