Update pipeline tag and add library name
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -1,11 +1,13 @@
|
|
1 |
---
|
|
|
2 |
language: cs
|
3 |
license: cc-by-nc-sa-4.0
|
4 |
tags:
|
5 |
- Czech
|
6 |
- GEC
|
7 |
- GECCC dataset
|
8 |
-
|
|
|
9 |
---
|
10 |
|
11 |
# Model Card for byt5-small-geccc-mate
|
@@ -18,20 +20,20 @@ the MATE method and the [GECCC dataset](https://hdl.handle.net/11234/1-4861).
|
|
18 |
|
19 |
## Model Description
|
20 |
|
21 |
-
-
|
22 |
-
-
|
23 |
-
-
|
24 |
-
-
|
25 |
-
-
|
26 |
-
|
27 |
-
|
28 |
-
-
|
29 |
|
30 |
## Model Sources
|
31 |
|
32 |
-
-
|
33 |
-
-
|
34 |
-
-
|
35 |
|
36 |
## Evaluation
|
37 |
|
@@ -69,8 +71,8 @@ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
|
|
69 |
|
70 |
```
|
71 |
@InProceedings{10.1007/978-3-032-02551-7_7,
|
72 |
-
author="Pechman, Petr and Straka, Milan and Strakov{\'a}, Jana and
|
73 |
-
editor="Ek{\v{s}}tein, Kamil and
|
74 |
title="Refining Czech GEC: Insights from a Multi-experiment Approach",
|
75 |
booktitle="Text, Speech, and Dialogue",
|
76 |
year="2026",
|
@@ -80,4 +82,4 @@ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
|
|
80 |
isbn="978-3-032-02551-7",
|
81 |
doi="10.1007/978-3-032-02551-7_7"
|
82 |
}
|
83 |
-
```
|
|
|
1 |
---
|
2 |
+
base_model: google/byt5-small
|
3 |
language: cs
|
4 |
license: cc-by-nc-sa-4.0
|
5 |
tags:
|
6 |
- Czech
|
7 |
- GEC
|
8 |
- GECCC dataset
|
9 |
+
pipeline_tag: text-generation
|
10 |
+
library_name: transformers
|
11 |
---
|
12 |
|
13 |
# Model Card for byt5-small-geccc-mate
|
|
|
20 |
|
21 |
## Model Description
|
22 |
|
23 |
+
- **Developed by:** [Seznam.cz](https://seznam.cz) and [Charles University, MFF, ÚFAL](https://ufal.mff.cuni.cz/)
|
24 |
+
- **Language(s) (NLP):** Czech
|
25 |
+
- **Model type:** character-based encoder-decoder Transformer model
|
26 |
+
- **Finetuned from model:** `google/byt5-small`
|
27 |
+
- **Finetuned on:**
|
28 |
+
- first synthetic errors generated by the MATE method (see [the paper](https://arxiv.org/abs/2506.22402))
|
29 |
+
- then the [GECCC dataset](https://hdl.handle.net/11234/1-4861)
|
30 |
+
- **License:** CC BY-NC-SA 4.0
|
31 |
|
32 |
## Model Sources
|
33 |
|
34 |
+
- **Repository:** https://github.com/ufal/tsd2025-gec
|
35 |
+
- **Paper:** [Refining Czech GEC: Insights from a Multi-Experiment Approach](https://arxiv.org/abs/2506.22402)
|
36 |
+
- **Dataset:** [GECCC dataset](https://hdl.handle.net/11234/1-4861)
|
37 |
|
38 |
## Evaluation
|
39 |
|
|
|
71 |
|
72 |
```
|
73 |
@InProceedings{10.1007/978-3-032-02551-7_7,
|
74 |
+
author="Pechman, Petr and Straka, Milan and Strakov{\'a}, Jana and Náplava, Jakub",
|
75 |
+
editor="Ek{\v{s}}tein, Kamil and Konopík, Miloslav and Pražák, Ondřej and Pártl, František",
|
76 |
title="Refining Czech GEC: Insights from a Multi-experiment Approach",
|
77 |
booktitle="Text, Speech, and Dialogue",
|
78 |
year="2026",
|
|
|
82 |
isbn="978-3-032-02551-7",
|
83 |
doi="10.1007/978-3-032-02551-7_7"
|
84 |
}
|
85 |
+
```
|