kiddothe2b
commited on
Commit
•
a148e41
1
Parent(s):
72e7501
Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,10 @@ sdk: static
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
Initiang from the recent work of (Chalkidis, Garneau, et al.), "LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development", we release legal NLP resources to broaden legal NLP research, while also helping practioners who aim to build assistive legal NLP technologies.
|
11 |
+
|
12 |
+
As of May 2023, we released:
|
13 |
+
|
14 |
+
- LeXFiles (https://huggingface.co/datasets/lexlms/lex_files), a new diverse English legal corpus including 11 sub-corpora that cover legislation and case law from 6 primarily English-speaking legal systems (EU, CoE, Canada, US, UK, India). The corpus comprises approx. 6 million documents which sum up to approx. 19 billion tokens.
|
15 |
+
- LegalLAMA (https://huggingface.co/datasets/lexlms/legal_lama), a diverse probing benchmark suite comprising 8 sub-tasks that aims to assess the acquaintance of legal knowledge that PLMs acquired in pre-training.
|
16 |
+
- 2 new legal-oriented PLMs, dubbed LexLMs (https://huggingface.co/models?search=lexlms/legal-roberta), warm-started from the RoBERTa models, and further pre-trained on LeXFiles for 1M additional steps.
|