Edit model card

RA-IT-NER-zh-7B

Description: The RA-IT-NER-zh-7B model is trained from Qwen1.5-7B using the proposed Retrieval Augmented Instruction Tuning (RA-IT) approach. This model can be used for Chinese Open NER with and without RAG. The training data is our constructed Sky-NER , an instruction tuning dataset for Chinese OpenNER. We follow the recipe of UniversalNER and use the large-scale SkyPile Corpus to construct this dataset. The data was collected by prompting gpt-3.5-turbo-0125 to label entities from passages and provide entity tags. The data collection prompt is as follows:

Instruction:
给定一段文本,你的任务是抽取所有实体并识别它们的实体类别。输出应为以下JSON格式:[{"实体1": "实体1的类别"}, ...]。

Check our paper for more information. Check our github repo about how to use the model.

Inference

The template for inference instances is as follows:

Prompting template:
USER: 以下是一些命名实体识别的例子:{Fill the NER examples here}
ASSISTANT: 我已读完这些例子。
USER: 文本:{Fill the input text here}
ASSISTANT: 我已读完这段文本。
USER: 文本中属于"{Fill the entity type here} "的实体有哪些?
ASSISTANT: (model's predictions in JSON format)

Note:

  • The model can conduct inference with and without NER examples. If you want to conduct inference without examples, just start from the third line in the above template by directly inputting "文本:{input text}" in the "USER" role.
  • Inferences are based on one entity type at a time. For multiple entity types, create separate instances for each type.

License

This model and its associated data are released under the CC BY-NC 4.0 license. They are primarily used for research purposes.

Downloads last month
12
Safetensors
Model size
7.72B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including EmmaStrong/RA-IT-NER-zh-7B