NanoGPT Shakespeare - Compressed Model

This is a compressed version of nanoGPT trained on Shakespeare text using low-rank matrix decomposition.

Model Details

Parameters: 562,432 (1.4× compression from 804,096 original)
Compression Method: Low-rank decomposition (rank=8) on MLP layers 2 and 3
Architecture: GPT with compressed MLP layers (6 layers, 6 heads, 384 embedding dim)
Training Data: Shakespeare corpus
Tokenization: Character-level
Context Length: 256 tokens

Compression Results

Original Parameters: 804,096
Compressed Parameters: 562,432
Compression Ratio: 1.4× smaller
Parameter Reduction: 30.1% (241,664 fewer parameters)
Method: SVD-based low-rank approximation of weight matrices

Usage

from transformers import AutoModel, AutoTokenizer

# Load model and tokenizer
model = AutoModel.from_pretrained("prompterminal/nanogpt-shakespeare-compressed", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("prompterminal/nanogpt-shakespeare-compressed", trust_remote_code=True)

# Generate Shakespeare-style text
text = "ROMEO:"
inputs = tokenizer.encode(text, return_tensors="pt")
outputs = model.generate(inputs, max_length=100, temperature=0.8)
result = tokenizer.decode(outputs[0])
print(result)

LLM AutoEval Configuration

Use this model in LLM AutoEval:

MODEL_ID: prompterminal/nanogpt-shakespeare-compressed
BENCHMARK: nous
TRUST_REMOTE_CODE: true
GPU: RTX 3090

Technical Details

The compression technique uses Singular Value Decomposition (SVD) to decompose large weight matrices into smaller low-rank factors:

Original MLP layer: W ∈ R^(d×4d)
Compressed: W ≈ U·V where U ∈ R^(d×r), V ∈ R^(r×4d)
Rank: r = 8 (much smaller than d = 384)

This maintains model expressiveness while significantly reducing parameters.

Evaluation

This model demonstrates effective neural network compression with minimal performance loss. Perfect for:

Resource-constrained deployment
Edge computing applications
Research into compression techniques
Educational purposes

Citation

Based on nanoGPT by Andrej Karpathy. Compression technique inspired by low-rank approximation methods in deep learning.