Update README.md
Browse files
README.md
CHANGED
|
@@ -3,13 +3,14 @@ language: eu
|
|
| 3 |
license: cc-by-sa-4.0
|
| 4 |
datasets:
|
| 5 |
- cc100
|
|
|
|
| 6 |
widget:
|
| 7 |
- text: "Euria egingo <mask> gaur ?"
|
| 8 |
- text: "<mask> umeari liburua eman dio."
|
| 9 |
- text: "Zein da zure <mask> ?"
|
| 10 |
---
|
| 11 |
|
| 12 |
-
## RoBERTa Basque
|
| 13 |
|
| 14 |
### Prerequisites
|
| 15 |
|
|
@@ -17,7 +18,7 @@ transformers==4.19.2
|
|
| 17 |
|
| 18 |
### Model architecture
|
| 19 |
|
| 20 |
-
This model uses half the size of RoBERTa base
|
| 21 |
|
| 22 |
### Tokenizer
|
| 23 |
|
|
@@ -26,12 +27,13 @@ Using BPE tokenizer with vocabulary size 50,000.
|
|
| 26 |
### Training Data
|
| 27 |
|
| 28 |
* Subset of [CC-100/eu](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
|
|
|
|
| 29 |
|
| 30 |
### Usage
|
| 31 |
|
| 32 |
```python
|
| 33 |
from transformers import pipeline
|
| 34 |
|
| 35 |
-
unmasker = pipeline('fill-mask', model='ClassCat/roberta-
|
| 36 |
unmasker("Zein da zure <mask> ?")
|
| 37 |
```
|
|
|
|
| 3 |
license: cc-by-sa-4.0
|
| 4 |
datasets:
|
| 5 |
- cc100
|
| 6 |
+
- oscar
|
| 7 |
widget:
|
| 8 |
- text: "Euria egingo <mask> gaur ?"
|
| 9 |
- text: "<mask> umeari liburua eman dio."
|
| 10 |
- text: "Zein da zure <mask> ?"
|
| 11 |
---
|
| 12 |
|
| 13 |
+
## RoBERTa Basque small model (Uncased)
|
| 14 |
|
| 15 |
### Prerequisites
|
| 16 |
|
|
|
|
| 18 |
|
| 19 |
### Model architecture
|
| 20 |
|
| 21 |
+
This model uses approximately half the size of RoBERTa base model parameters.
|
| 22 |
|
| 23 |
### Tokenizer
|
| 24 |
|
|
|
|
| 27 |
### Training Data
|
| 28 |
|
| 29 |
* Subset of [CC-100/eu](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
|
| 30 |
+
* Subset of [oscar](https://huggingface.co/datasets/oscar)
|
| 31 |
|
| 32 |
### Usage
|
| 33 |
|
| 34 |
```python
|
| 35 |
from transformers import pipeline
|
| 36 |
|
| 37 |
+
unmasker = pipeline('fill-mask', model='ClassCat/roberta-small-basque')
|
| 38 |
unmasker("Zein da zure <mask> ?")
|
| 39 |
```
|