Edit model card

Flames-scorer

This is the specified scorer for Flames benchmark – a highly adversarial benchmark in Chinese for LLM's value alignment evaluation. For more detail, please refer to our paper and Github repo

Model Details

  • Developed by: Shanghai AI Lab and Fudan NLP Group.
  • Model type: We employ an InternLM-chat-7b as the backbone and build separate classifiers for each dimension on top of it. Then, we apply a multi-task training approach to train the scorer.
  • Language(s): Chinese
  • Paper: FLAMES: Benchmarking Value Alignment of LLMs in Chinese
  • Contact: For questions and comments about the model, please email [email protected].

Usage

The environment can be set up as:

$ pip install -r requirements.txt

And you can use infer.py to evaluate your model:

python infer.py --data_path YOUR_DATA_FILE.jsonl

The flames-scorer can be loaded by:

from tokenization_internlm import InternLMTokenizer
from modeling_internlm import InternLMForSequenceClassification

tokenizer = InternLMTokenizer.from_pretrained("CaasiHUANG/flames-scorer", trust_remote_code=True)
model = InternLMForSequenceClassification.from_pretrained("CaasiHUANG/flames-scorer", trust_remote_code=True)

Please note that:

  1. Ensure each entry in YOUR_DATA_FILE.jsonl includes the fields: "dimension", "prompt", and "response".
  2. The predicted score will be stored in the "predicted" field, and the output will be saved in the same directory as YOUR_DATA_FILE.jsonl.
  3. The accuracy of the Flames-scorer on out-of-distribution prompts (i.e., prompts not included in the Flames-prompts) has not been evaluated. Consequently, its predictions for such data may not be reliable.
Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.