MiniGPT — Lightweight Transformer for Text Generation

MiniGPT is a minimal yet powerful GPT-style language model built from scratch using PyTorch. It is designed for educational clarity, customization, and efficient real-time text generation. This project demonstrates the full training and inference pipeline of a decoder-only transformer architecture, including streaming capabilities and modern sampling strategies.

Hosted with ❤️ by @Austin207

Model Description

MiniGPT is a small, word-level transformer model with the following architecture:

4 Transformer layers
4 Attention heads
128 Embedding dimensions
512 FFN hidden size
Max sequence length: 128
Word-level tokenizer (trained with Hugging Face tokenizers)

Despite its size, it supports advanced generation strategies including:

Repetition Penalty
Temperature Sampling
Top-K & Top-P (nucleus) sampling
Real-time streaming output

Usage

Install dependencies:

pip install torch tokenizers

Load the model and tokenizer:

from miniGPT import MiniGPT
from inference import generate_stream
from tokenizers import Tokenizer
import torch

# Load tokenizer
tokenizer = Tokenizer.from_file("wordlevel.json")

# Load model
model = MiniGPT(
    vocab_size=tokenizer.get_vocab_size(),
    embed_dim=128,
    num_heads=4,
    ff_dim=512,
    num_layers=4,
    max_seq_len=128
)

checkpoint = torch.load("model_checkpoint_step20000.pt")
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

# Generate text
prompt = "Beneath the ancient ruins"
generate_stream(model, tokenizer, prompt, max_new_tokens=60, temperature=1.0, top_k=50, top_p=0.9)

Training

Train from scratch on any plain-text dataset:

python training.py

Training includes:

Checkpointing
Sample generation previews
Word-level tokenization with tokenizers
Custom datasets via alphabetical_dataset.txt or your own

Files in This Repository

File	Purpose
`miniGPT.py`	Core Transformer model
`transformer.py`	Transformer block logic
`multiheadattention.py`	Multi-head attention module
`Tokenizer.py`	Tokenizer loader
`training.py`	Training loop
`inference.py`	CLI and streaming generation
`dataprocess.py`	Text preprocessing tools
`wordlevel.json`	Trained word-level tokenizer
`alphabetical_dataset.txt`	Sample dataset
`requirements.txt`	Required dependencies

Model Card

Property	Value
Model Type	Decoder-only GPT
Size	Small (~4.6M params)
Trained On	Word-level dataset (custom)
Intended Use	Text generation, educational demo
License	MIT

Intended Use and Limitations

This model is meant for educational, experimental, and research purposes. It is not suitable for commercial or production use out-of-the-box. Expect limitations in coherence, factuality, and long-context reasoning.

Contributions

We welcome improvements, bug fixes, and new features!

# Fork, clone, and create a branch
git clone https://github.com/austin207/Transformer-Virtue-v2.git
cd Transformer-Virtue-v2
git checkout -b feature/your-feature

Then open a pull request!

License

This project is licensed under the MIT License.

Explore More

Based on GPT architecture from OpenAI
Inspired by karpathy/nanoGPT
Compatible with Hugging Face tools and tokenizer ecosystem

Austin207
/

Transformer-MiniGPT