|
--- |
|
title: Shakespeare Coriolanus Transformer |
|
emoji: π |
|
colorFrom: blue |
|
colorTo: red |
|
sdk: gradio |
|
sdk_version: 3.50.2 |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
|
|
# Shakespeare Coriolanus Transformer |
|
This is a test model created to train and test a basic small decoder only transfomer with 124m parameters. The code has modules to both train and test the model. The trained model can be tested on HugginFace. |
|
|
|
# Steps to Run Locally |
|
1. Create and activate a virtual environment: |
|
```bash |
|
python -m venv venv |
|
source venv/bin/activate # On Windows: venv\Scripts\activate |
|
``` |
|
|
|
2. Install the requirements and the Hugging Face CLI: |
|
```bash |
|
pip install -r requirements.txt |
|
pip install --upgrade huggingface-hub |
|
``` |
|
4. To train the model: |
|
```bash |
|
python src/train.py |
|
``` |
|
|
|
5. To run the app: |
|
```bash |
|
python src/app.py |
|
``` |
|
The interface will be available at `http://localhost:7860` by default. |
|
|
|
# Training Logs |
|
``` |
|
loaded 338025 tokens |
|
1 epoch = 41 batches |
|
BatchSize: 256 || Tokens per batch; 32 |
|
[STEP 2] Initializing model... |
|
[STEP 3] Printing Model Architecture Summary... |
|
|
|
Model Architecture: |
|
DecoderTransformer( |
|
(wte): Embedding(50257, 768) |
|
(wpe): Embedding(1024, 768) |
|
(blocks): ModuleList( |
|
(0-11): 12 x Block( |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(att): Attention( |
|
(w_qkv): Linear(in_features=768, out_features=2304, bias=True) |
|
(proj): Linear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): MLP( |
|
(fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): GELU(approximate='tanh') |
|
(proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
) |
|
) |
|
(lm_head): Linear(in_features=768, out_features=50257, bias=False) |
|
) |
|
|
|
Total Parameters: 124.44M |
|
Total Steps 41 (epochs 1 , stepsPerEpoch 41) |
|
[STEP 4] Starting Training... |
|
(venv) gitesh.grover@Giteshs-MacBook-Pro ai-era-assignment12 % python train.py |
|
|
|
[INFO] Using device: mps |
|
[STEP 1] Preparing datasets... |
|
/Users/gitesh.grover/Study/AI-ERA/venv/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020 |
|
warnings.warn( |
|
loaded 338025 tokens |
|
1 epoch = 41 batches |
|
BatchSize: 256 || Tokens per batch; 32 |
|
[STEP 2] Initializing model... |
|
[STEP 3] Printing Model Architecture Summary... |
|
|
|
Model Architecture: |
|
DecoderTransformer( |
|
(wte): Embedding(50257, 768) |
|
(wpe): Embedding(1024, 768) |
|
(blocks): ModuleList( |
|
(0-11): 12 x Block( |
|
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(att): Attention( |
|
(w_qkv): Linear(in_features=768, out_features=2304, bias=True) |
|
(proj): Linear(in_features=768, out_features=768, bias=True) |
|
) |
|
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) |
|
(mlp): MLP( |
|
(fc): Linear(in_features=768, out_features=3072, bias=True) |
|
(gelu): GELU(approximate='tanh') |
|
(proj): Linear(in_features=3072, out_features=768, bias=True) |
|
) |
|
) |
|
) |
|
(lm_head): Linear(in_features=768, out_features=50257, bias=False) |
|
) |
|
|
|
Total Parameters: 124.44M |
|
Total Steps 12300 (epochs 300 , stepsPerEpoch 41) |
|
[STEP 4] Starting Training... |
|
Epoch 1, Loss: 11.0051 |
|
Epoch 2, Loss: 6.6564 |
|
Epoch 3, Loss: 6.1045 |
|
Epoch 4, Loss: 5.6797 |
|
Epoch 5, Loss: 5.3227 |
|
Epoch 6, Loss: 4.9817 |
|
Epoch 7, Loss: 4.6557 |
|
Epoch 8, Loss: 4.4270 |
|
Epoch 9, Loss: 4.2327 |
|
Epoch 10, Loss: 3.9861 |
|
Epoch 11, Loss: 3.7526 |
|
Epoch 12, Loss: 3.5475 |
|
Epoch 13, Loss: 3.3379 |
|
Epoch 14, Loss: 3.1133 |
|
Epoch 15, Loss: 2.8888 |
|
Epoch 16, Loss: 2.7211 |
|
Epoch 17, Loss: 2.4558 |
|
Epoch 18, Loss: 2.1982 |
|
Epoch 19, Loss: 1.9944 |
|
Epoch 20, Loss: 1.7707 |
|
Epoch 21, Loss: 1.6288 |
|
Epoch 22, Loss: 1.4231 |
|
Epoch 23, Loss: 1.2248 |
|
Epoch 24, Loss: 1.0180 |
|
Epoch 25, Loss: 0.8970 |
|
Epoch 26, Loss: 0.7644 |
|
Epoch 27, Loss: 0.6474 |
|
Epoch 28, Loss: 0.5318 |
|
Epoch 29, Loss: 0.4483 |
|
Epoch 30, Loss: 0.3601 |
|
Epoch 31, Loss: 0.2932 |
|
Epoch 32, Loss: 0.2754 |
|
Epoch 33, Loss: 0.2155 |
|
Epoch 34, Loss: 0.2092 |
|
Epoch 35, Loss: 0.1893 |
|
Epoch 36, Loss: 0.1753 |
|
Epoch 37, Loss: 0.1671 |
|
|
|
: |
|
: |
|
|
|
Epoch 203, Loss: 0.1224 |
|
Epoch 204, Loss: 0.1243 |
|
Epoch 205, Loss: 0.1308 |
|
Epoch 206, Loss: 0.1358 |
|
Epoch 207, Loss: 0.1413 |
|
Epoch 208, Loss: 0.1425 |
|
Epoch 209, Loss: 0.1281 |
|
Epoch 210, Loss: 0.1264 |
|
Epoch 211, Loss: 0.1305 |
|
Epoch 212, Loss: 0.1399 |
|
Epoch 213, Loss: 0.1266 |
|
Epoch 214, Loss: 0.1135 |
|
Epoch 215, Loss: 0.1127 |
|
Epoch 216, Loss: 0.1137 |
|
Epoch 217, Loss: 0.1045 |
|
Epoch 218, Loss: 0.1074 |
|
Epoch 219, Loss: 0.1014 |
|
Epoch 220, Loss: 0.0997 |
|
|
|
Target loss achieved at step 8979. Breaking |
|
0.09973063319921494 |
|
[STEP 5] Saving Model... |
|
[STEP 6] Testing by predicting next few tokens |
|
X Shape before test: torch.Size([256, 32]) |
|
256 |
|
Y Shape after test: torch.Size([256, 30]) |
|
``` |