GPT2WaP
This model is a gpt2 model trained from scratch on the War and peace book. It achieves the following results on the evaluation set:
- Loss: 9.0987
- Perplexity: 8943.6289
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 512
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 40
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Perplexity |
---|---|---|---|---|
10.157 | 0.6897 | 10 | 9.2336 | 10235.7480 |
9.2581 | 1.3793 | 20 | 8.9452 | 7671.1870 |
8.8166 | 2.0690 | 30 | 9.4917 | 13248.7207 |
8.5094 | 2.7586 | 40 | 9.5417 | 13928.9434 |
8.0914 | 3.4483 | 50 | 9.5507 | 14054.4785 |
7.663 | 4.1379 | 60 | 9.4760 | 13043.2441 |
7.3275 | 4.8276 | 70 | 9.3510 | 11510.8203 |
6.9788 | 5.5172 | 80 | 9.0822 | 8797.7188 |
6.6639 | 6.2069 | 90 | 8.9803 | 7945.4014 |
6.3749 | 6.8966 | 100 | 8.6494 | 5706.8130 |
6.0702 | 7.5862 | 110 | 8.5696 | 5268.9268 |
5.9107 | 8.2759 | 120 | 8.3612 | 4277.6265 |
5.6724 | 8.9655 | 130 | 8.4294 | 4579.6484 |
5.5949 | 9.6552 | 140 | 8.4934 | 4882.4316 |
5.4904 | 10.3448 | 150 | 8.4683 | 4761.3862 |
5.3792 | 11.0345 | 160 | 8.4647 | 4744.5381 |
5.3091 | 11.7241 | 170 | 8.5767 | 5306.3535 |
5.233 | 12.4138 | 180 | 8.5257 | 5042.5068 |
5.2252 | 13.1034 | 190 | 8.5328 | 5078.8433 |
5.1445 | 13.7931 | 200 | 8.5871 | 5361.9390 |
5.0824 | 14.4828 | 210 | 8.5784 | 5315.4043 |
5.0272 | 15.1724 | 220 | 8.6434 | 5672.6934 |
4.979 | 15.8621 | 230 | 8.6836 | 5905.4277 |
4.924 | 16.5517 | 240 | 8.7112 | 6070.2261 |
4.9394 | 17.2414 | 250 | 8.7233 | 6144.3931 |
4.8663 | 17.9310 | 260 | 8.7411 | 6254.5234 |
4.8599 | 18.6207 | 270 | 8.7824 | 6518.7896 |
4.8572 | 19.3103 | 280 | 8.8338 | 6862.5586 |
4.8064 | 20.0 | 290 | 8.7774 | 6485.7441 |
4.746 | 20.6897 | 300 | 8.8458 | 6944.8892 |
4.7569 | 21.3793 | 310 | 8.8436 | 6930.1416 |
4.6954 | 22.0690 | 320 | 8.8618 | 7057.1084 |
4.7277 | 22.7586 | 330 | 8.8706 | 7119.4478 |
4.6432 | 23.4483 | 340 | 8.9084 | 7393.6138 |
4.6032 | 24.1379 | 350 | 8.9111 | 7413.5176 |
4.6198 | 24.8276 | 360 | 8.9526 | 7728.0210 |
4.5874 | 25.5172 | 370 | 8.9740 | 7895.1641 |
4.5455 | 26.2069 | 380 | 8.9365 | 7604.7129 |
4.5313 | 26.8966 | 390 | 8.9738 | 7893.2969 |
4.5297 | 27.5862 | 400 | 8.9659 | 7831.8110 |
4.5279 | 28.2759 | 410 | 8.9914 | 8034.0391 |
4.4974 | 28.9655 | 420 | 9.0293 | 8344.2529 |
4.4554 | 29.6552 | 430 | 9.0191 | 8259.1533 |
4.4651 | 30.3448 | 440 | 9.0236 | 8296.4531 |
4.4647 | 31.0345 | 450 | 9.0349 | 8391.1279 |
4.4668 | 31.7241 | 460 | 9.0530 | 8543.8340 |
4.4264 | 32.4138 | 470 | 9.0722 | 8709.4141 |
4.4008 | 33.1034 | 480 | 9.0876 | 8844.6104 |
4.3982 | 33.7931 | 490 | 9.0711 | 8700.4893 |
4.3846 | 34.4828 | 500 | 9.0894 | 8860.7441 |
4.3971 | 35.1724 | 510 | 9.0879 | 8847.6973 |
4.379 | 35.8621 | 520 | 9.0949 | 8909.6025 |
4.3696 | 36.5517 | 530 | 9.1097 | 9042.2295 |
4.3447 | 37.2414 | 540 | 9.1007 | 8961.6953 |
4.3796 | 37.9310 | 550 | 9.0869 | 8839.0781 |
4.364 | 38.6207 | 560 | 9.0987 | 8943.6289 |
Framework versions
- Transformers 4.40.1
- Pytorch 2.3.0+cu121
- Datasets 2.19.0
- Tokenizers 0.19.1
- Downloads last month
- 145
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for Kasdeja23/GPT2WaP
Base model
openai-community/gpt2