MiniGPT β Lightweight Transformer for Text Generation
MiniGPT is a minimal yet powerful GPT-style language model built from scratch using PyTorch. It is designed for educational clarity, customization, and efficient real-time text generation. This project demonstrates the full training and inference pipeline of a decoder-only transformer architecture, including streaming capabilities and modern sampling strategies.
Hosted with β€οΈ by @Austin207
Model Description
MiniGPT is a small, word-level transformer model with the following architecture:
- 4 Transformer layers
- 4 Attention heads
- 128 Embedding dimensions
- 512 FFN hidden size
- Max sequence length: 128
- Word-level tokenizer (trained with Hugging Face
tokenizers
)
Despite its size, it supports advanced generation strategies including:
- Repetition Penalty
- Temperature Sampling
- Top-K & Top-P (nucleus) sampling
- Real-time streaming output
Usage
Install dependencies:
pip install torch tokenizers
Load the model and tokenizer:
from miniGPT import MiniGPT
from inference import generate_stream
from tokenizers import Tokenizer
import torch
# Load tokenizer
tokenizer = Tokenizer.from_file("wordlevel.json")
# Load model
model = MiniGPT(
vocab_size=tokenizer.get_vocab_size(),
embed_dim=128,
num_heads=4,
ff_dim=512,
num_layers=4,
max_seq_len=128
)
checkpoint = torch.load("model_checkpoint_step20000.pt")
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
# Generate text
prompt = "Beneath the ancient ruins"
generate_stream(model, tokenizer, prompt, max_new_tokens=60, temperature=1.0, top_k=50, top_p=0.9)
Training
Train from scratch on any plain-text dataset:
python training.py
Training includes:
- Checkpointing
- Sample generation previews
- Word-level tokenization with
tokenizers
- Custom datasets via
alphabetical_dataset.txt
or your own
Files in This Repository
File | Purpose |
---|---|
miniGPT.py |
Core Transformer model |
transformer.py |
Transformer block logic |
multiheadattention.py |
Multi-head attention module |
Tokenizer.py |
Tokenizer loader |
training.py |
Training loop |
inference.py |
CLI and streaming generation |
dataprocess.py |
Text preprocessing tools |
wordlevel.json |
Trained word-level tokenizer |
alphabetical_dataset.txt |
Sample dataset |
requirements.txt |
Required dependencies |
Model Card
Property | Value |
---|---|
Model Type | Decoder-only GPT |
Size | Small (~4.6M params) |
Trained On | Word-level dataset (custom) |
Intended Use | Text generation, educational demo |
License | MIT |
Intended Use and Limitations
This model is meant for educational, experimental, and research purposes. It is not suitable for commercial or production use out-of-the-box. Expect limitations in coherence, factuality, and long-context reasoning.
Contributions
We welcome improvements, bug fixes, and new features!
# Fork, clone, and create a branch
git clone https://github.com/austin207/Transformer-Virtue-v2.git
cd Transformer-Virtue-v2
git checkout -b feature/your-feature
Then open a pull request!
License
This project is licensed under the MIT License.
Explore More
- Based on GPT architecture from OpenAI
- Inspired by karpathy/nanoGPT
- Compatible with Hugging Face tools and tokenizer ecosystem