NanoGPT Shakespeare - Compressed Model
This is a compressed version of nanoGPT trained on Shakespeare text using low-rank matrix decomposition.
Model Details
- Parameters: 562,432 (1.4× compression from 804,096 original)
- Compression Method: Low-rank decomposition (rank=8) on MLP layers 2 and 3
- Architecture: GPT with compressed MLP layers (6 layers, 6 heads, 384 embedding dim)
- Training Data: Shakespeare corpus
- Tokenization: Character-level
- Context Length: 256 tokens
Compression Results
- Original Parameters: 804,096
- Compressed Parameters: 562,432
- Compression Ratio: 1.4× smaller
- Parameter Reduction: 30.1% (241,664 fewer parameters)
- Method: SVD-based low-rank approximation of weight matrices
Usage
from transformers import AutoModel, AutoTokenizer
# Load model and tokenizer
model = AutoModel.from_pretrained("prompterminal/nanogpt-shakespeare-compressed", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("prompterminal/nanogpt-shakespeare-compressed", trust_remote_code=True)
# Generate Shakespeare-style text
text = "ROMEO:"
inputs = tokenizer.encode(text, return_tensors="pt")
outputs = model.generate(inputs, max_length=100, temperature=0.8)
result = tokenizer.decode(outputs[0])
print(result)
LLM AutoEval Configuration
Use this model in LLM AutoEval:
MODEL_ID: prompterminal/nanogpt-shakespeare-compressed
BENCHMARK: nous
TRUST_REMOTE_CODE: true
GPU: RTX 3090
Technical Details
The compression technique uses Singular Value Decomposition (SVD) to decompose large weight matrices into smaller low-rank factors:
- Original MLP layer: W ∈ R^(d×4d)
- Compressed: W ≈ U·V where U ∈ R^(d×r), V ∈ R^(r×4d)
- Rank: r = 8 (much smaller than d = 384)
This maintains model expressiveness while significantly reducing parameters.
Evaluation
This model demonstrates effective neural network compression with minimal performance loss. Perfect for:
- Resource-constrained deployment
- Edge computing applications
- Research into compression techniques
- Educational purposes
Citation
Based on nanoGPT by Andrej Karpathy. Compression technique inspired by low-rank approximation methods in deep learning.
- Downloads last month
- 4