MiniGPT β€” Lightweight Transformer for Text Generation

MiniGPT is a minimal yet powerful GPT-style language model built from scratch using PyTorch. It is designed for educational clarity, customization, and efficient real-time text generation. This project demonstrates the full training and inference pipeline of a decoder-only transformer architecture, including streaming capabilities and modern sampling strategies.

Hosted with ❀️ by @Austin207


Model Description

MiniGPT is a small, word-level transformer model with the following architecture:

  • 4 Transformer layers
  • 4 Attention heads
  • 128 Embedding dimensions
  • 512 FFN hidden size
  • Max sequence length: 128
  • Word-level tokenizer (trained with Hugging Face tokenizers)

Despite its size, it supports advanced generation strategies including:

  • Repetition Penalty
  • Temperature Sampling
  • Top-K & Top-P (nucleus) sampling
  • Real-time streaming output

Usage

Install dependencies:

pip install torch tokenizers

Load the model and tokenizer:

from miniGPT import MiniGPT
from inference import generate_stream
from tokenizers import Tokenizer
import torch

# Load tokenizer
tokenizer = Tokenizer.from_file("wordlevel.json")

# Load model
model = MiniGPT(
    vocab_size=tokenizer.get_vocab_size(),
    embed_dim=128,
    num_heads=4,
    ff_dim=512,
    num_layers=4,
    max_seq_len=128
)

checkpoint = torch.load("model_checkpoint_step20000.pt")
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

# Generate text
prompt = "Beneath the ancient ruins"
generate_stream(model, tokenizer, prompt, max_new_tokens=60, temperature=1.0, top_k=50, top_p=0.9)

Training

Train from scratch on any plain-text dataset:

python training.py

Training includes:

  • Checkpointing
  • Sample generation previews
  • Word-level tokenization with tokenizers
  • Custom datasets via alphabetical_dataset.txt or your own

Files in This Repository

File Purpose
miniGPT.py Core Transformer model
transformer.py Transformer block logic
multiheadattention.py Multi-head attention module
Tokenizer.py Tokenizer loader
training.py Training loop
inference.py CLI and streaming generation
dataprocess.py Text preprocessing tools
wordlevel.json Trained word-level tokenizer
alphabetical_dataset.txt Sample dataset
requirements.txt Required dependencies

Model Card

Property Value
Model Type Decoder-only GPT
Size Small (~4.6M params)
Trained On Word-level dataset (custom)
Intended Use Text generation, educational demo
License MIT

Intended Use and Limitations

This model is meant for educational, experimental, and research purposes. It is not suitable for commercial or production use out-of-the-box. Expect limitations in coherence, factuality, and long-context reasoning.


Contributions

We welcome improvements, bug fixes, and new features!

# Fork, clone, and create a branch
git clone https://github.com/austin207/Transformer-Virtue-v2.git
cd Transformer-Virtue-v2
git checkout -b feature/your-feature

Then open a pull request!


License

This project is licensed under the MIT License.


Explore More

  • Based on GPT architecture from OpenAI
  • Inspired by karpathy/nanoGPT
  • Compatible with Hugging Face tools and tokenizer ecosystem
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support