nbroad
/

fix_punct_cased_t5_small

text2text-generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

Metrics Training metrics Community

nbroad commited on Sep 29, 2022

Commit

e9d8acf

·

1 Parent(s): eafa309

Update README.md

Files changed (1) hide show

README.md +11 -4

README.md CHANGED Viewed

@@ -7,14 +7,21 @@ metrics:
 model-index:
 - name: fix_punct_cased_t5_small
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# fix_punct_cased_t5_small
-This model is a fine-tuned version of [google/t5-v1_1-small](https://huggingface.co/google/t5-v1_1-small) on the None dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.2744
 - Rouge1: 93.3712

 model-index:
 - name: fix_punct_cased_t5_small
   results: []
+datasets:
+- https://huggingface.co/datasets/nbroad/fix_punctuation
+widget:
+- text: This is, a sentence. with odd punctuation to show off what, the model. can do
+- text: What, should the proper. punctuation. in. this sentence be?
+- text: Where are. we? What, is, the meaning, of this?
 ---
+# fix_punct_cased_t5_small
+This model is a fine-tuned version of [google/t5-v1_1-small](https://huggingface.co/google/t5-v1_1-small) on the [NPR utterances dataset](https://www.kaggle.com/datasets/shuyangli94/interview-npr-media-dialog-transcripts?select=utterances.csv).
+## Dataset
+The model was trained on 80k rows from the above dataset consisting of NPR radio transcripts. Commans, periods, and semicolons were removed from the text and then random commas, periods, and semicolons were added. The model was trained to place those three punctuation marks in the correct location. The casing of the texts was not modified during training.
 It achieves the following results on the evaluation set:
 - Loss: 0.2744
 - Rouge1: 93.3712