transformer

Andrej Karpathy follow along

Objective

Recreate the decoder-only transformer from bottom up

Generate coherent WikiHow articles

Data

Wikihow corpus

Results

Sub par text generation results because of compute constraints (my potato laptop)

future improvements

Use GPU

Use BPE, Wordpiece etc for tokenization. The character level tokenization method is simplistic and fails to capture statistics of the corpus

Downloads last month
12
Safetensors
Model size
36.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ruchir99/transformer-101

Space using ruchir99/transformer-101 1