Custom 57M Language Model

A custom 57.55M parameter causal language model with modern transformer architecture.

Model Details

  • Parameters: 57,553,632 (57.55M)
  • Architecture: 12-layer Transformer
  • Hidden Size: 432
  • Attention Heads: 8
  • Head Dimension: 54
  • Intermediate Size: 1,728
  • Vocabulary Size: 50,257 (GPT-2 tokenizer)
  • Max Sequence Length: 1,024

Architecture Features

  • RoPE Positional Embeddings: Rotary Position Embedding (θ=10000.0)
  • SwiGLU Activation: Swish-Gated Linear Unit in feed-forward networks
  • RMSNorm: Root Mean Square Layer Normalization (ε=1e-06)
  • Tied Embeddings: Input and output embeddings share weights
  • Dropout: 0.1 dropout rate

Training Configuration

  • Dummy Phase: 2 epochs, 1,000 samples, LR=0.0005
  • C4 Phase: 3 epochs, 1,000 samples, LR=0.0003
  • Optimizer: AdamW (weight_decay=0.1)
  • Scheduler: Cosine Annealing
  • Gradient Clipping: 1.0

Generation Parameters

  • Temperature: 0.8
  • Top-K: 50
  • Top-P: 0.9
  • Repetition Penalty: 1.1
  • Max New Tokens: 100

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("your-username/custom-57m-language-model")
model = AutoModelForCausalLM.from_pretrained("your-username/custom-57m-language-model")

input_text = "The future of artificial intelligence"
inputs = tokenizer.encode(input_text, return_tensors='pt')
outputs = model.generate(
    inputs, 
    max_length=100, 
    temperature=0.8,
    top_k=50,
    top_p=0.9,
    repetition_penalty=1.1
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Training Dataset

  • Primary: C4 (Colossal Clean Crawled Corpus)
  • Warm-up: Synthetic dummy data for initial training

License

MIT License

Model Card

This model was trained as an educational demonstration of transformer architecture implementation with modern techniques like RoPE embeddings and SwiGLU activations.

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support