Update README.md
Browse files
README.md
CHANGED
@@ -1,16 +1,6 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
Train data
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
- [Yle Finnish News Archive 2011-2018](http://urn.fi/urn:nbn:fi:lb-2017070501)
|
8 |
-
- [Yle Finnish News Archive 2019-2020](http://urn.fi/urn:nbn:fi:lb-2021050401)
|
9 |
-
- [Finnish News Agency Archive (STT)](http://urn.fi/urn:nbn:fi:lb-2018121001)
|
10 |
-
- [The Suomi24 Sentences Corpus](http://urn.fi/urn:nbn:fi:lb-2020021803)
|
11 |
-
|
12 |
-
Tested with 1000 samples from the previous datasets:
|
13 |
-
Median CER 1.1%
|
14 |
-
MEAN CER 4.2%
|
15 |
-
|
16 |
-
More detailed info coming later...
|
|
|
1 |
+
--- |-
|
2 |
+
Based on Finnish pretrained T5 model version small-nl24
|
3 |
+
Train data
|
4 |
+
Around 300k samples from from following datasets - [wikipedia](https://huggingface.co/datasets/wikipedia) - [Yle Finnish News Archive 2011-2018](http://urn.fi/urn:nbn:fi:lb-2017070501) - [Yle Finnish News Archive 2019-2020](http://urn.fi/urn:nbn:fi:lb-2021050401) - [Finnish News Agency Archive (STT)](http://urn.fi/urn:nbn:fi:lb-2018121001) - [The Suomi24 Sentences Corpus](http://urn.fi/urn:nbn:fi:lb-2020021803)
|
5 |
+
Tested with 1000 samples from the previous datasets Median CER 1.1% MEAN CER 4.2%
|
6 |
+
More detailed info coming later...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|