Jón Daðason
commited on
Commit
·
fb032b4
1
Parent(s):
35c6d2a
Updated README.md
Browse files
README.md
CHANGED
|
@@ -8,14 +8,14 @@ license: cc-by-4.0
|
|
| 8 |
datasets:
|
| 9 |
- igc
|
| 10 |
- ic3
|
| 11 |
-
-
|
| 12 |
- mc4
|
| 13 |
---
|
| 14 |
|
| 15 |
# Nordic ELECTRA-Small
|
| 16 |
This model was pretrained on the following corpora:
|
| 17 |
* The [Icelandic Gigaword Corpus](http://igc.arnastofnun.is/) (IGC)
|
| 18 |
-
* The
|
| 19 |
* The [Icelandic Crawled Corpus](https://huggingface.co/datasets/jonfd/ICC) (ICC)
|
| 20 |
* The [Multilingual Colossal Clean Crawled Corpus](https://huggingface.co/datasets/mc4) (mC4) - Icelandic, Norwegian, Swedish and Danish text obtained from .is, .no, .se and .dk domains, respectively
|
| 21 |
|
|
|
|
| 8 |
datasets:
|
| 9 |
- igc
|
| 10 |
- ic3
|
| 11 |
+
- jonfd/ICC
|
| 12 |
- mc4
|
| 13 |
---
|
| 14 |
|
| 15 |
# Nordic ELECTRA-Small
|
| 16 |
This model was pretrained on the following corpora:
|
| 17 |
* The [Icelandic Gigaword Corpus](http://igc.arnastofnun.is/) (IGC)
|
| 18 |
+
* The Icelandic Common Crawl Corpus (IC3)
|
| 19 |
* The [Icelandic Crawled Corpus](https://huggingface.co/datasets/jonfd/ICC) (ICC)
|
| 20 |
* The [Multilingual Colossal Clean Crawled Corpus](https://huggingface.co/datasets/mc4) (mC4) - Icelandic, Norwegian, Swedish and Danish text obtained from .is, .no, .se and .dk domains, respectively
|
| 21 |
|