---
title: Shakespeare Coriolanus Transformer
emoji: 📚
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
pinned: false
---

# Shakespeare Coriolanus Transformer
This is a test model created to train and test a basic small decoder only transfomer with 124m parameters. The code has modules to both train and test the model. The trained model can be tested on HugginFace.

# Steps to Run Locally
1. Create and activate a virtual environment:
```bash
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
```

2. Install the requirements and the Hugging Face CLI:
```bash
pip install -r requirements.txt
pip install --upgrade huggingface-hub
```
4. To train the model:
```bash
python src/train.py
```

5. To run the app:
```bash
python src/app.py
```
    The interface will be available at `http://localhost:7860` by default.

# Training Logs
```
loaded 338025 tokens
1 epoch = 41 batches
BatchSize: 256 || Tokens per batch; 32
[STEP 2] Initializing model...
[STEP 3] Printing Model Architecture Summary...

Model Architecture:
DecoderTransformer(
  (wte): Embedding(50257, 768)
  (wpe): Embedding(1024, 768)
  (blocks): ModuleList(
    (0-11): 12 x Block(
      (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (att): Attention(
        (w_qkv): Linear(in_features=768, out_features=2304, bias=True)
        (proj): Linear(in_features=768, out_features=768, bias=True)
      )
      (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (mlp): MLP(
        (fc): Linear(in_features=768, out_features=3072, bias=True)
        (gelu): GELU(approximate='tanh')
        (proj): Linear(in_features=3072, out_features=768, bias=True)
      )
    )
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

Total Parameters: 124.44M
Total Steps 41 (epochs 1 , stepsPerEpoch 41)
[STEP 4] Starting Training...
(venv) gitesh.grover@Giteshs-MacBook-Pro ai-era-assignment12 % python train.py

[INFO] Using device: mps
[STEP 1] Preparing datasets...
/Users/gitesh.grover/Study/AI-ERA/venv/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
loaded 338025 tokens
1 epoch = 41 batches
BatchSize: 256 || Tokens per batch; 32
[STEP 2] Initializing model...
[STEP 3] Printing Model Architecture Summary...

Model Architecture:
DecoderTransformer(
  (wte): Embedding(50257, 768)
  (wpe): Embedding(1024, 768)
  (blocks): ModuleList(
    (0-11): 12 x Block(
      (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (att): Attention(
        (w_qkv): Linear(in_features=768, out_features=2304, bias=True)
        (proj): Linear(in_features=768, out_features=768, bias=True)
      )
      (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (mlp): MLP(
        (fc): Linear(in_features=768, out_features=3072, bias=True)
        (gelu): GELU(approximate='tanh')
        (proj): Linear(in_features=3072, out_features=768, bias=True)
      )
    )
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

Total Parameters: 124.44M
Total Steps 12300 (epochs 300 , stepsPerEpoch 41)
[STEP 4] Starting Training...
Epoch 1, Loss: 11.0051
Epoch 2, Loss: 6.6564
Epoch 3, Loss: 6.1045
Epoch 4, Loss: 5.6797
Epoch 5, Loss: 5.3227
Epoch 6, Loss: 4.9817
Epoch 7, Loss: 4.6557
Epoch 8, Loss: 4.4270
Epoch 9, Loss: 4.2327
Epoch 10, Loss: 3.9861
Epoch 11, Loss: 3.7526
Epoch 12, Loss: 3.5475
Epoch 13, Loss: 3.3379
Epoch 14, Loss: 3.1133
Epoch 15, Loss: 2.8888
Epoch 16, Loss: 2.7211
Epoch 17, Loss: 2.4558
Epoch 18, Loss: 2.1982
Epoch 19, Loss: 1.9944
Epoch 20, Loss: 1.7707
Epoch 21, Loss: 1.6288
Epoch 22, Loss: 1.4231
Epoch 23, Loss: 1.2248
Epoch 24, Loss: 1.0180
Epoch 25, Loss: 0.8970
Epoch 26, Loss: 0.7644
Epoch 27, Loss: 0.6474
Epoch 28, Loss: 0.5318
Epoch 29, Loss: 0.4483
Epoch 30, Loss: 0.3601
Epoch 31, Loss: 0.2932
Epoch 32, Loss: 0.2754
Epoch 33, Loss: 0.2155
Epoch 34, Loss: 0.2092
Epoch 35, Loss: 0.1893
Epoch 36, Loss: 0.1753
Epoch 37, Loss: 0.1671

:
:

Epoch 203, Loss: 0.1224
Epoch 204, Loss: 0.1243
Epoch 205, Loss: 0.1308
Epoch 206, Loss: 0.1358
Epoch 207, Loss: 0.1413
Epoch 208, Loss: 0.1425
Epoch 209, Loss: 0.1281
Epoch 210, Loss: 0.1264
Epoch 211, Loss: 0.1305
Epoch 212, Loss: 0.1399
Epoch 213, Loss: 0.1266
Epoch 214, Loss: 0.1135
Epoch 215, Loss: 0.1127
Epoch 216, Loss: 0.1137
Epoch 217, Loss: 0.1045
Epoch 218, Loss: 0.1074
Epoch 219, Loss: 0.1014
Epoch 220, Loss: 0.0997

Target loss achieved at step 8979. Breaking
0.09973063319921494
[STEP 5] Saving Model...
[STEP 6] Testing by predicting next few tokens
X Shape before test: torch.Size([256, 32])
256
Y Shape after test: torch.Size([256, 30])
```