NanoGPT Shakespeare - Compressed Model

This is a compressed version of nanoGPT trained on Shakespeare text using low-rank matrix decomposition.

Model Details

  • Parameters: 562,432 (1.4× compression from 804,096 original)
  • Compression Method: Low-rank decomposition (rank=8) on MLP layers 2 and 3
  • Architecture: GPT with compressed MLP layers (6 layers, 6 heads, 384 embedding dim)
  • Training Data: Shakespeare corpus
  • Tokenization: Character-level
  • Context Length: 256 tokens

Compression Results

  • Original Parameters: 804,096
  • Compressed Parameters: 562,432
  • Compression Ratio: 1.4× smaller
  • Parameter Reduction: 30.1% (241,664 fewer parameters)
  • Method: SVD-based low-rank approximation of weight matrices

Usage

from transformers import AutoModel, AutoTokenizer

# Load model and tokenizer
model = AutoModel.from_pretrained("prompterminal/nanogpt-shakespeare-compressed", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("prompterminal/nanogpt-shakespeare-compressed", trust_remote_code=True)

# Generate Shakespeare-style text
text = "ROMEO:"
inputs = tokenizer.encode(text, return_tensors="pt")
outputs = model.generate(inputs, max_length=100, temperature=0.8)
result = tokenizer.decode(outputs[0])
print(result)

LLM AutoEval Configuration

Use this model in LLM AutoEval:

MODEL_ID: prompterminal/nanogpt-shakespeare-compressed
BENCHMARK: nous
TRUST_REMOTE_CODE: true
GPU: RTX 3090

Technical Details

The compression technique uses Singular Value Decomposition (SVD) to decompose large weight matrices into smaller low-rank factors:

  • Original MLP layer: W ∈ R^(d×4d)
  • Compressed: W ≈ U·V where U ∈ R^(d×r), V ∈ R^(r×4d)
  • Rank: r = 8 (much smaller than d = 384)

This maintains model expressiveness while significantly reducing parameters.

Evaluation

This model demonstrates effective neural network compression with minimal performance loss. Perfect for:

  • Resource-constrained deployment
  • Edge computing applications
  • Research into compression techniques
  • Educational purposes

Citation

Based on nanoGPT by Andrej Karpathy. Compression technique inspired by low-rank approximation methods in deep learning.

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support