rmroczkowski
commited on
Commit
•
22d7758
1
Parent(s):
bd522da
Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ license: cc-by-4.0
|
|
16 |
---
|
17 |
|
18 |
# plT5 Small
|
19 |
-
**plT5** models are T5-based language models trained on Polish corpora.
|
20 |
|
21 |
## Corpus
|
22 |
plT5 was trained on six different corpora available for Polish language:
|
@@ -31,7 +31,7 @@ plT5 was trained on six different corpora available for Polish language:
|
|
31 |
| [Wolne Lektury](https://wolnelektury.pl/) | 41M | 5.5k |
|
32 |
|
33 |
## Tokenizer
|
34 |
-
The training dataset was tokenized into subwords using a sentencepiece unigram with
|
35 |
vocabulary size of 50k tokens.
|
36 |
|
37 |
## Usage
|
|
|
16 |
---
|
17 |
|
18 |
# plT5 Small
|
19 |
+
**plT5** models are T5-based language models trained on Polish corpora. The models were optimized for the original T5 denoising target.
|
20 |
|
21 |
## Corpus
|
22 |
plT5 was trained on six different corpora available for Polish language:
|
|
|
31 |
| [Wolne Lektury](https://wolnelektury.pl/) | 41M | 5.5k |
|
32 |
|
33 |
## Tokenizer
|
34 |
+
The training dataset was tokenized into subwords using a sentencepiece unigram model with
|
35 |
vocabulary size of 50k tokens.
|
36 |
|
37 |
## Usage
|