rubenroy commited on
Commit
d0f1ab0
·
verified ·
1 Parent(s): 050d7d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -3
README.md CHANGED
@@ -1,3 +1,15 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - rubenroy/GammaCorpus-v2-100k
5
+ language:
6
+ - en
7
+ tags:
8
+ - gammacorpus
9
+ ---
10
+
11
+ # GPT2 GammaCorpus v2 100k
12
+ This is a GPT-2 language model fine-tuned on the GammaCorpus v2 - 100k dataset, which consists of 100,000 structured user-assistant conversational pairs. The model was initialised from the pretrained gpt2 weights and trained for 2 epochs using maximum sequence length 256, batch size 2 (with gradient accumulation) and a learning rate of 5e-5.
13
+ The tokenizer used is the original GPT-2 tokenizer with the EOS token also used as the pad token. The training objective was causal language modeling.
14
+
15
+ Link to training dataset: https://huggingface.co/datasets/rubenroy/GammaCorpus-v2-100k