Update README.md
Browse files
README.md
CHANGED
|
@@ -7,14 +7,21 @@ metrics:
|
|
| 7 |
model-index:
|
| 8 |
- name: fix_punct_cased_t5_small
|
| 9 |
results: []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
|
|
|
|
|
|
| 11 |
|
| 12 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 13 |
-
should probably proofread and complete it, then remove this comment. -->
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
This model is a fine-tuned version of [google/t5-v1_1-small](https://huggingface.co/google/t5-v1_1-small) on the None dataset.
|
| 18 |
It achieves the following results on the evaluation set:
|
| 19 |
- Loss: 0.2744
|
| 20 |
- Rouge1: 93.3712
|
|
|
|
| 7 |
model-index:
|
| 8 |
- name: fix_punct_cased_t5_small
|
| 9 |
results: []
|
| 10 |
+
datasets:
|
| 11 |
+
- https://huggingface.co/datasets/nbroad/fix_punctuation
|
| 12 |
+
widget:
|
| 13 |
+
- text: This is, a sentence. with odd punctuation to show off what, the model. can do
|
| 14 |
+
- text: What, should the proper. punctuation. in. this sentence be?
|
| 15 |
+
- text: Where are. we? What, is, the meaning, of this?
|
| 16 |
---
|
| 17 |
+
# fix_punct_cased_t5_small
|
| 18 |
+
This model is a fine-tuned version of [google/t5-v1_1-small](https://huggingface.co/google/t5-v1_1-small) on the [NPR utterances dataset](https://www.kaggle.com/datasets/shuyangli94/interview-npr-media-dialog-transcripts?select=utterances.csv).
|
| 19 |
|
|
|
|
|
|
|
| 20 |
|
| 21 |
+
## Dataset
|
| 22 |
+
The model was trained on 80k rows from the above dataset consisting of NPR radio transcripts. Commans, periods, and semicolons were removed from the text and then random commas, periods, and semicolons were added. The model was trained to place those three punctuation marks in the correct location. The casing of the texts was not modified during training.
|
| 23 |
+
|
| 24 |
|
|
|
|
| 25 |
It achieves the following results on the evaluation set:
|
| 26 |
- Loss: 0.2744
|
| 27 |
- Rouge1: 93.3712
|