AmenRa commited on
Commit
9abeb6e
·
1 Parent(s): 6bdb134
Files changed (1) hide show
  1. src/about.py +11 -11
src/about.py CHANGED
@@ -36,17 +36,17 @@ INTRODUCTION_TEXT = """"""
36
  LLM_BENCHMARKS_TEXT = f"""
37
  ## GuardBench Leaderboard
38
 
39
- Welcome to the GuardBench Leaderboard, an independent benchmark designed to evaluate guardrail models.
40
 
41
  The leaderboard reports results for the following datasets:
42
- - PromptsEN: 30k+ English prompts
43
- - ResponsesEN: 33k+ English single-turn conversations where the AI-generated response may be safe or unsafe
44
- - PromptsDE 30k+ German prompts
45
- - PromptsFR: 30k+ French prompts
46
- - PromptsIT: 30k+ Italian prompts
47
- - PromptsES: 30k+ Spanish prompts
48
-
49
- Evaluation results are shown in terms of F1.
50
  For a fine-grained evaluation, please see our publications referenced below.
51
 
52
  ## Guardrail Models
@@ -54,11 +54,11 @@ Guardrail models are Large Language Models fine-tuned for safety classification,
54
  By complementing other safety measures such as safety alignment, they aim to prevent generative AI systems from providing harmful information to the users.
55
 
56
  ## GuardBench
57
- GuardBench is a large-scale benchmark for guardrail models comprising 40 safety evaluation datasets that was recently proposed to evaluate their effectiveness.
58
  You can find more information in the [paper](https://aclanthology.org/2024.emnlp-main.1022) we presented at EMNLP 2024.
59
 
60
  ## Python
61
- GuardBench is accompained by a [Python library](https://github.com/AmenRa/GuardBench) providing evaluation functionalities on top of it.
62
 
63
  ## Evaluation Metric
64
  Evaluation results are shown in terms of F1.
 
36
  LLM_BENCHMARKS_TEXT = f"""
37
  ## GuardBench Leaderboard
38
 
39
+ Welcome to the **GuardBench's Leaderboard**, an *independent* benchmark designed to evaluate guardrail models.
40
 
41
  The leaderboard reports results for the following datasets:
42
+ - **PromptsEN**: 30k+ English prompts compiled from multiple sources
43
+ - **ResponsesEN**: 33k+ English single-turn conversations from multiple sources where the AI-generated response may be safe or unsafe
44
+ - **PromptsDE** 30k+ German prompts
45
+ - **PromptsFR**: 30k+ French prompts
46
+ - **PromptsIT**: 30k+ Italian prompts
47
+ - **PromptsES**: 30k+ Spanish prompts
48
+
49
+ Evaluation **results** are shown in terms of **F1**.
50
  For a fine-grained evaluation, please see our publications referenced below.
51
 
52
  ## Guardrail Models
 
54
  By complementing other safety measures such as safety alignment, they aim to prevent generative AI systems from providing harmful information to the users.
55
 
56
  ## GuardBench
57
+ GuardBench is a large-scale benchmark for guardrail models comprising *40 safety evaluation datasets* that was recently proposed to evaluate their effectiveness.
58
  You can find more information in the [paper](https://aclanthology.org/2024.emnlp-main.1022) we presented at EMNLP 2024.
59
 
60
  ## Python
61
+ GuardBench is supported by a [Python library](https://github.com/AmenRa/GuardBench) providing evaluation functionalities on top of it.
62
 
63
  ## Evaluation Metric
64
  Evaluation results are shown in terms of F1.