Spaces:

anonymoussssssss
/

ThaiSafetyBench-Leaderboard

Sleeping

App Files Files Community

ThaiSafetyBench-Leaderboard / src /about.py

anonymoussssssss

Initial commit

672654b verified 3 months ago

raw

history blame

3.9 kB

	from dataclasses import dataclass
	from enum import Enum

	@dataclass
	class Task:
	benchmark: str
	metric: str
	col_name: str


	# Select your tasks here
	# ---------------------------------------------------
	class Tasks(Enum):
	# task_key in the json file, metric_key in the json file, name to display in the leaderboard
	task0 = Task("overall", "asr", "🥇 Overall ASR ⬇️")
	task1 = Task("Discrimination, Exclusion, Toxicity, Hateful, Offensive", "asr", "👉 Discrimination, Exclusion, Toxicity, Hateful, Offensive ASR ⬇️")
	task2 = Task("Human-Chatbot Interaction Harms", "asr", "👉 Human-Chatbot Interaction Harm ASR ⬇️")
	task3 = Task("Information Hazards", "asr", "👉 Information Hazards ASR ⬇️")
	task4 = Task("Malicious Uses", "asr", "👉 Malicious Uses ASR ⬇️")
	task5 = Task("Misinformation Harms", "asr", "👉 Misinformation Harms ASR ⬇️")
	task6 = Task("Thai Socio-Cultural Harm", "asr", "👉 Thai Socio-Cultural Harms ASR ⬇️")
	task7 = Task("Thai culture related attack", "asr", "🔶 Thai Culture Related Attack ASR ⬇️")
	task8 = Task("General prompt attack", "asr", "🔶 General Prompt Attack ⬇️")

	NUM_FEWSHOT = 0 # Change with your few shot
	# ---------------------------------------------------



	# Your leaderboard name
	TITLE = """<h1 align="center" id="space-title">ThaiSafetyBench Leaderboard 🥇</h1>"""

	# What does your leaderboard evaluate?
	INTRODUCTION_TEXT = """
	ThaiSafetyBench is a safety benchmark tailored to the Thai language and culture.
	"""

	# Which evaluations are you running? how can people reproduce what you have?
	LLM_BENCHMARKS_TEXT = f"""
	## How it works

	We evaluate models on the ThaiSafetyBench benchmark, which consists of various tasks related to safety and
	harmful content in the Thai language and culture. The evaluation is performed using the ThaiSafetyBench dataset,
	which includes a range of scenarios designed to assess the model's ability to handle sensitive topics,
	discrimination, misinformation, and other harmful content. The automatic evaluation is conducted using the GPT-4o model as a judge.
	We report the Attack Success Rate (ASR) for each task, which indicates the model's vulnerability to the harmful content.
	We categorize the tasks into two groups: Thai Culture-Related Attacks, which evaluate the model's ability to handle content specific to Thai culture, including its norms, values, and sensitivities, and General Prompt Attacks, which assess the model's capacity to manage broadly harmful content that, while not unique to Thai culture, remains relevant in a wider context.

	## Reproducibility

	To reproduce our results, we provide the automatic evaluation code in our Github repository. You can run the evaluation on your own models by following these steps:

	1. Generate the responses of your model on the ThaiSafetyBench dataset with temperature at 0.1
	2. Use the provided evaluation script to evaluate the responses using the GPT-4o model as a judge

	## Developers and Maintainers

	<Anonymous due to paper submission policy>
	"""

	SUBMIT_TEXT = """
	We openly welcome submissions of new models to the ThaiSafetyBench leaderboard via email. Due to the paper submission anonymity policy, we cannot accept submissions at this time.
	```
	Subject: [Your Model Name] ThaiSafetyBench Model Submission
	Content:
	- Model name
	- Developer
	- Parameters (in billions)
	- Model type (Base or CPT)
	- Base model name (if the model is a CPT, otherwise leave empty)
	- Release date (YYYY-MM)
	- How to run the model (Python code to generate responses, if the model is on Hugging Face Hub, otherwise provide a code snippet to run the model and generate responses)
	- Contact email (for us to contact you about the evaluation results)
	```
	"""

	CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
	CITATION_BUTTON_TEXT = r"""
	coming soon...
	"""