🍼 BabyLangModel

A tiny GPT-style language model trained from scratch on the TinyStories dataset. Built using PyTorch and a custom architecture inspired by nanoGPT. This model was trained for 200k iterations on a consumer GPU (RTX 4060) using custom code from scratch.


🧠 Model Details

  • Architecture: GPT (custom implementation)
  • Parameters: ~10–15M
  • Layers: 6
  • Heads: 6
  • Embedding Size: 384
  • Block Size: 128
  • Tokenizer: GPT-2 (tiktoken)
  • Training Steps: 200,000
  • Training Loss: ~1.80

πŸ“š Training Data

We trained on the open-source TinyStories dataset by Microsoft Research. It's a dataset of short, simple English stories written for young children (ages 2–4).

  • Clean, simple narratives
  • Ideal for small model generalization
  • 100% open and publicly available

🧰 Usage (with transformers)

This model uses a custom architecture, so you need to use trust_remote_code=True:

from transformers import AutoModel

model = AutoModel.from_pretrained("Exquisique/BabyLangModel", trust_remote_code=True)

✨ Sample Generation

Prompt: Once upon a time there was a tiny robot who

Output: ...lived in a far away home. One day, a little girl named Lily decided to go on a special trip in the forest. She walked and walked until she got there but suddenly she started to go. Her mom called her and said, "Don't worry, Lily. We will get you my special ride."

πŸ—£οΈ Still improving, but quite readable and story-like after 200k iterations!


πŸ’» Train It Yourself

You can find the full training code on GitHub or use this structure:

python -m src.tokenizer      # Tokenize TinyStories
python -m src.train          # Train model from scratch
python -m src.generate       # Generate text

You’ll also find:

  • Checkpointing & resume support
  • Configurable hyperparams
  • Gradient accumulation & mixed precision

πŸ”§ Config Used

{
  "vocab_size": 50257,
  "block_size": 128,
  "n_layer": 6,
  "n_head": 6,
  "n_embd": 384,
  "model_type": "gpt"
}

πŸ“¦ Inference Notes

To load the model, use:

from transformers import AutoModel
model = AutoModel.from_pretrained("Exquisique/BabyLangModel", trust_remote_code=True)

You can also upload a tokenizer later for full text input support (e.g. with tiktoken).


πŸ§‘β€πŸ’» Author

Exquisique β€” GenAI explorer, poetic dreamer, and neural model whisperer.


πŸ“œ License

MIT β€” open source, fine-tune and remix freely. ✨

Downloads last month
83
Safetensors
Model size
30M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Exquisique/BabyLangModel