kevinum
/

t5-v1_1-base-finetuned-English-to-BASH

Text2Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions Metrics Training metrics Community

kevinum commited on Feb 24, 2023

Commit

5cf262b

·

1 Parent(s): ab9cfa7

Update README.md

Files changed (1) hide show

README.md +11 -4

README.md CHANGED Viewed

@@ -2,17 +2,19 @@
 license: apache-2.0
 tags:
 - generated_from_trainer
 model-index:
 - name: t5-v1_1-base-finetuned-English-to-BASH
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # t5-v1_1-base-finetuned-English-to-BASH
-This model is a fine-tuned version of [google/t5-v1_1-base](https://huggingface.co/google/t5-v1_1-base) on the None dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.7958
 - Nl2bash M: 0.6179
@@ -28,7 +30,12 @@ More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure

 license: apache-2.0
 tags:
 - generated_from_trainer
+metrics:
+- nl2bash_m
 model-index:
 - name: t5-v1_1-base-finetuned-English-to-BASH
   results: []
 ---
 # t5-v1_1-base-finetuned-English-to-BASH
+Created by: [Josh Shih](https://huggingface.co/Josh98), [Alex Sha](https://huggingface.co/alexsha), [Kevin Um](https://huggingface.co/kevinum) for EEP 596 - Natural Language Processing at University of Washington (Seattle).
+This model is a fine-tuned version of [google/t5-v1_1-base](https://huggingface.co/google/t5-v1_1-base) on a more balanced iteration of the [NL2BASH](https://github.com/TellinaTool/nl2bash/tree/master/data) dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.7958
 - Nl2bash M: 0.6179
 ## Training and evaluation data
+This model was trained and evaluated using a custom iteration of [NL2BASH](https://github.com/TellinaTool/nl2bash/tree/master/data). The original NL2BASH dataset contains a large class imbalance with too many bash commands which begin with 'find'.
+A maximum threshold was set to remove text/BASH pairs which exceeded the threshold, and [GPT-3](https://openai.com/blog/gpt-3-apps/) API was used to generate text/BASH pairs for those below the threshold.
+~5500 original text/BASH pairs and ~5700 generated text/BASH pairs were used, giving a total of ~11200 lines of text/BASH pairs. Shown below is the class distribution for the top-5 commands.
+![class_balanced.png](https://s3.amazonaws.com/moonup/production/uploads/1677215336540-63d8b9876ac3104e50cd9634.png)
 ## Training procedure