| license: mit | |
| language: | |
| - en | |
| tags: | |
| - text generation | |
| datasets: | |
| - fhswf/TinyStoriesV2_cleaned | |
| # BPE_GPT2_TinyStoriesV2_cleaned | |
| BPE Tokenizer Model for dataset 'fhswf/TinyStoriesV2_cleaned' | |
| Based on get-neo BPE Tokenizer, but with a smaller vocabulary. | |
| Trained with TinyStoriesV2. | |
| - Vocab Size: 1024 | |
| - 256 Base chars | |
| - 1 extra Token: <|endoftext|> | |
| - 767 merges |