File size: 3,935 Bytes
a7e99d1
 
 
 
 
46c45ee
 
 
29a63fc
 
a7e99d1
6189104
 
 
 
 
 
f28d4ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6189104
f28d4ea
6189104
 
 
d272b71
6189104
d272b71
6189104
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d272b71
 
6189104
d272b71
6189104
a7e99d1
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: cc-by-sa-4.0
language:
- ar
tags:
- relation-extraction
- evidence-extraction
- seq2seq
datasets:
- dru-ac/ArSRED
---
# AREEj: Arabic Relation Extraction with Evidence
You can use AREEj to extract relations from Arabic documents. Each document can contain multiple relations, and each relation contains six elements, the source, target, their named entities, relation type between them, and evidence. The evidence is used for two reasons: improving the Relation Extraction task, and explaining the LLM's predictions. You can also use it as an edge between the related entities.

AREEj was introduced in the Proceedings of The Second Arabic Natural Language Processing Conference paper [AREEj: Arabic Relation Extraction with Evidence](https://aclanthology.org/2024.arabicnlp-1.6/).


### How to use
```
pip install transformers datasets evaluate transformers[torch]
pip install sentencepiece
```
```python
from transformers import MBartTokenizer, MBartForConditionalGeneration
import torch

tokenizer = MBartTokenizer.from_pretrained('dru-ac/AREEj', max_length=1024)
model = MBartForConditionalGeneration.from_pretrained('dru-ac/AREEj')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model.to(device)

def generate_prediction(input_text):
    input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)    
    with torch.no_grad():
        output = model.generate(
            input_ids,
            decoder_start_token_id=tokenizer.lang_code_to_id['ar_AR'],
        )

    prediction = tokenizer.decode(output[0], skip_special_tokens=False)
    
    return prediction

input_text = 'تأسس المركز العربي للأبحاث ودراسة السياسات في عام 2010 في الدوحة في قطر'
prediction = generate_prediction(input_text)
print('Prediction:', prediction)
```

### If you use the code or model, please reference this work in your paper:
```
@inproceedings{mraikhat-etal-2024-areej,
    title = "{AREE}j: {A}rabic Relation Extraction with Evidence",
    author = "Rakan Al Mraikhat, Osama  and
      Hamoud, Hadi  and
      Zaraket, Fadi A.",
    editor = "Habash, Nizar  and
      Bouamor, Houda  and
      Eskander, Ramy  and
      Tomeh, Nadi  and
      Abu Farha, Ibrahim  and
      Abdelali, Ahmed  and
      Touileb, Samia  and
      Hamed, Injy  and
      Onaizan, Yaser  and
      Alhafni, Bashar  and
      Antoun, Wissam  and
      Khalifa, Salam  and
      Haddad, Hatem  and
      Zitouni, Imed  and
      AlKhamissi, Badr  and
      Almatham, Rawan  and
      Mrini, Khalil",
    booktitle = "Proceedings of The Second Arabic Natural Language Processing Conference",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.arabicnlp-1.6/",
    doi = "10.18653/v1/2024.arabicnlp-1.6",
    pages = "67--72",
    abstract = "Relational entity extraction is key in building knowledge graphs. A relational entity has a source, a tail and atype. In this paper, we consider Arabic text and introduce evidence enrichment which intuitivelyinforms models for better predictions. Relational evidence is an expression in the textthat explains how sources and targets relate. {\%}It also provides hints from which models learn. This paper augments the existing relational extraction dataset with evidence annotation to its 2.9-million Arabic relations.We leverage the augmented dataset to build , a relation extraction with evidence model from Arabic documents. The evidence augmentation model we constructed to complete the dataset achieved .82 F1-score (.93 precision, .73 recall). The target outperformed SOTA mREBEL with .72 F1-score (.78 precision, .66 recall)."
}
```
### License
This model is licensed under the CC BY-SA 4.0 license. The text of the license can be found [here](https://creativecommons.org/licenses/by-sa/4.0/).