--- language: - en library_name: transformers pipeline_tag: text-generation tags: - decoder-only - nlp - autoregressive - rope - gqa - rmsnorm - swiglu - from-scratch datasets: - roneneldan/TinyStories license: apache-2.0 model-index: - name: GatorGPT2 results: [] --- # 🐊 GatorGPT2 **GatorGPT2** is a small, decoder-only Transformer trained from scratch on a subset of **TinyStories** for next-token prediction. It uses **RoPE** (rotary positional embeddings), **GQA** (grouped-query attention), **RMSNorm**, and a **SwiGLU MLP**. Tokenizer is **tiktoken** with **p50k_base** vocabulary. > **Repo**: `kunjcr2/GatorGPT2` > **Intended use**: research, experimentation, educational demos for training/serving custom LMs --- ## πŸ”§ Architecture - **Type**: Decoder-only, causal LM - **Layers**: `num_hidden_layers = 10` - **Hidden size**: `hidden_size = 448` - **Heads**: `num_attention_heads = 8` (GQA with 2 KV heads per query group) - **FFN**: SwiGLU, `d_ff β‰ˆ 2Γ— hidden_size` - **Norm**: RMSNorm (pre-norm blocks) - **Positional**: RoPE - **Vocab**: `vocab_size = 50,257` (tiktoken p50k_base) - **Context length**: `max_position_embeddings = 1024` - **Weight tying**: output head tied with token embeddings - **Files**: - `pytorch_model.bin` (or `model.safetensors`) - `config.json` (`model_type: "gator-transformer"`, `auto_map` provided) - `modeling_gator.py`, `configuration_gator.py`, `__init__.py` - `tokenizer_manifest.json` β†’ `{ "library": "tiktoken", "encoding": "p50k_base" }` > Custom code is loaded via `trust_remote_code=True`. --- ## πŸ“¦ Install ```bash pip install torch transformers tiktoken ```` --- ## πŸš€ Quickstart (Transformers + tiktoken) ```python import torch from transformers import AutoModelForCausalLM import tiktoken MODEL_ID = "kunjcr2/GatorGPT2" DEVICE = "cuda" if torch.cuda.is_available() else "cpu" # Load model (uses custom modeling code) model = AutoModelForCausalLM.from_pretrained( MODEL_ID, trust_remote_code=True, torch_dtype=torch.float32, ).to(DEVICE).eval() # Tokenizer (p50k_base via tiktoken) tok = tiktoken.get_encoding("p50k_base") def generate_greedy(prompt: str, max_new_tokens: int = 64) -> str: ids = tok.encode(prompt) x = torch.tensor([ids], device=DEVICE) for _ in range(max_new_tokens): with torch.no_grad(): out = model(x) logits = out["logits"] if isinstance(out, dict) else out.logits next_id = int(torch.argmax(logits[0, -1])) x = torch.cat([x, torch.tensor([[next_id]], device=DEVICE)], dim=1) return tok.decode(x[0].tolist()).replace("<|endoftext|>", "").strip() print(generate_greedy("Little girl was")) ``` ### Temperature-only sampling (no top-k/p) ```python def generate_temp(prompt, max_new_tokens=64, temperature=0.9): ids = tok.encode(prompt) x = torch.tensor([ids], device=DEVICE) for _ in range(max_new_tokens): with torch.no_grad(): logits = model(x).logits[0, -1] / max(temperature, 1e-6) probs = torch.softmax(logits, dim=-1) next_id = torch.multinomial(probs, 1).item() x = torch.cat([x, torch.tensor([[next_id]], device=DEVICE)], dim=1) return tok.decode(x[0].tolist()).replace("<|endoftext|>", "").strip() ``` --- ## 🌐 Serving with vLLM (Optional) ```bash python -m vllm.entrypoints.openai.api_server \ --model kunjcr2/GatorGPT2 \ --tokenizer kunjcr2/GatorGPT2 \ --trust-remote-code \ --dtype float32 \ --max-model-len 1024 \ --host 0.0.0.0 --port 8000 ``` Call it: ```bash curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{"model":"kunjcr2/GatorGPT2","prompt":"Little girl was","max_tokens":64,"temperature":0.9}' ``` --- ## πŸ§ͺ Training Summary * **Data**: `roneneldan/TinyStories` (train split; subset of \~1.5M stories) * **Objective**: causal LM (next-token prediction), cross-entropy * **Optimizer**: AdamW (`lr=3e-4`, `weight_decay=0.01`, `eps=1e-8`) * **Precision**: bf16 autocast on CUDA during forward for speed * **Batching**: sliding windows via a `FastDataset` (window size e.g. 512, stride 256) * **Eval**: periodic validation over fixed batches; train loss downsampled to eval steps for plotting * **Hardware**: intended for A100-class GPUs; also runs on CPU for debug (slow) > This is a *from-scratch* toy/educational model; quality depends heavily on steps, data cleaned, and schedule. Expect simple, short English generations. --- ## βœ… Intended Use * Research on small decoder-only Transformers * Educational demos (training, saving, model hub, vLLM serving) * Baseline for experimenting with: * LoRA/QLoRA, quantization, distillation * Attention variants (Flash-Attention, GQA configs) * Data curation and scaling laws **Not** intended for production or safety-critical use. --- ## ⚠️ Limitations & Risks * Trained on children’s story data β‡’ limited world knowledge & reasoning * May output incoherent, repetitive, or undesirable text * No instruction-tuning or RLHF * Tokenizer is `tiktoken p50k_base` (not a standard HF tokenizer), so examples use `tiktoken` directly --- ## πŸ“ Repo Structure ``` . β”œβ”€β”€ config.json β”œβ”€β”€ pytorch_model.bin # or model.safetensors β”œβ”€β”€ modeling_gator.py # custom architecture (RoPE, GQA, RMSNorm, SwiGLU) β”œβ”€β”€ configuration_gator.py β”œβ”€β”€ __init__.py └── tokenizer_manifest.json # { "library": "tiktoken", "encoding": "p50k_base" } ``` `config.json` includes: ```json { "model_type": "gator-transformer", "architectures": ["GatorModel"], "auto_map": { "AutoConfig": "configuration_gator.GatorConfig", "AutoModelForCausalLM": "modeling_gator.GatorModel" } } ``` --- ## πŸ“Š Evaluation No formal benchmarks reported. You can compute loss/perplexity on your own validation subset: ```python import math, torch from torch.utils.data import DataLoader, TensorDataset # ...build a DataLoader of (input_ids, target_ids) pairs... def eval_loss(model, loader, device="cuda"): model.eval(); total, n = 0.0, 0 with torch.no_grad(): for x, y in loader: x, y = x.to(device), y.to(device) logits = model(x).logits loss = torch.nn.functional.cross_entropy( logits.view(-1, logits.size(-1)), y.view(-1) ) total += loss.item(); n += 1 return total / max(n,1) val_loss = eval_loss(model, your_val_loader) print("val loss:", val_loss, " ppl:", math.exp(val_loss)) ``` --- ## πŸ“œ License **apache-2.0** --- ## πŸ™Œ Acknowledgements * **TinyStories** dataset by Ronen Eldan et al. (`roneneldan/TinyStories`) * Community tooling: **PyTorch**, **πŸ€— Transformers**, **tiktoken**, **vLLM** --- ## βœ‰οΈ Citation If you use this model, please cite this repository: ```bibtex @software{GatorGPT2_2025, author = {Kunj}, title = {GatorGPT2: a small decoder-only Transformer with RoPE+GQA}, year = {2025}, url = {https://huggingface.co/kunjcr2/GatorGPT2} } ```