πŸ’‘ TinyStories-GPT2-10k

TinyStories-GPT2-10k is a lightweight, decoder-only transformer model trained from scratch on a tokenized version of the TinyStories dataset. It uses a custom Byte Pair Encoding (BPE) tokenizer with a vocabulary size of 10,000 tokens, making it well-suited for experiments in efficient language modeling, scaling laws, and low-resource fine-tuning.


🧠 Model Architecture

This model follows the core GPT-2 architectural principles with a few simplifications to reduce parameter count and training cost.

Component Value
Architecture Decoder-only Transformer (GPT-like)
Layers 8
Embedding Size 128
Attention Heads 16
Feedforward Size 512 (4Γ— expansion)
Sequence Length 1024
Vocabulary Size 10,000
Total Parameters ~2.99M
Dropout / Bias None (disabled for simplicity)
Weight Tying βœ… Enabled (input/output embeddings)

Initialization

Weights were initialized with a normal distribution (𝒩(0, 0.02)), with additional scaling in residual paths by ( \frac{1}{\sqrt{2N}} ), where ( N = 8 ) (number of decoder layers), as inspired by GPT-2's residual accumulation strategy.


πŸ§ͺ Training Configuration

Setting Value
Dataset TinyStories-tokenized-10k
Tokenizer Custom BPE (10k vocab)
Training Tokens 459M
Validation Tokens 4.6M
Max Tokens Seen ~1.37B
Epochs 3
Batch Size 48 Γ— 512
Optimizer AdamW
Learning Rate 0.06 (linear decay, warmup=256 steps)
Betas (0.9, 0.95)
Weight Decay 0.1
Device A100 GPU
Training Time ~72 minutes

Performance

Metric Value
Initial Loss 9.23
Final Train Loss 4.98
Best Validation Loss 4.69
Overfitting ❌ Not observed

πŸ”‘ Tokenizer

The model was trained using a custom BPE tokenizer built from the TinyStories dataset using the Hugging Face tokenizers library. The tokenizer was capped at 10,000 tokens and saved as bpe-tokenizer_tinystories.json.


πŸ“¦ Files Included

  • best_model.pt: Final model weights
  • bpe-tokenizer_tinystories.json: BPE tokenizer (10k vocab)
  • config.yaml: Architecture and training configuration
  • loss_history.json: Per-epoch training losses

πŸ”— Related Resources


πŸš€ Inference Example

from transformers import GPT2TokenizerFast, GPT2LMHeadModel

tokenizer = GPT2TokenizerFast.from_pretrained("KabirBakhshaei/TinyStories-GPT2-10k", tokenizer_file="bpe-tokenizer_tinystories.json")
model = GPT2LMHeadModel.from_pretrained("KabirBakhshaei/TinyStories-GPT2-10k")

prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.9, top_k=10)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train KabirBakhshaei/TinyStories-GPT2-10k