English
text generation
maxkm commited on
Commit
5422e45
·
verified ·
1 Parent(s): 20eb232

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -9,6 +9,12 @@ datasets:
9
  ---
10
 
11
  # BPE_GPT2_TinyStoriesV2_cleaned
 
12
 
13
- ## Model Description
14
- BPE Tokenizer Model for dataset 'fhswf/TinyStoriesV2_cleaned'
 
 
 
 
 
 
9
  ---
10
 
11
  # BPE_GPT2_TinyStoriesV2_cleaned
12
+ BPE Tokenizer Model for dataset 'fhswf/TinyStoriesV2_cleaned'
13
 
14
+ Based on get-neo BPE Tokenizer, but with a smaller vocabulary.
15
+ Trained with TinyStoriesV2.
16
+
17
+ - Vocab Size: 1024
18
+ - 256 Base chars
19
+ - 1 extra Token: <|endoftext|>
20
+ - 3839 merges