GPT2WaP

This model is a gpt2 model trained from scratch on the War and peace book. It achieves the following results on the evaluation set:

Loss: 9.0987
Perplexity: 8943.6289

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 512
total_eval_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 40
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Perplexity
10.157	0.6897	10	9.2336	10235.7480
9.2581	1.3793	20	8.9452	7671.1870
8.8166	2.0690	30	9.4917	13248.7207
8.5094	2.7586	40	9.5417	13928.9434
8.0914	3.4483	50	9.5507	14054.4785
7.663	4.1379	60	9.4760	13043.2441
7.3275	4.8276	70	9.3510	11510.8203
6.9788	5.5172	80	9.0822	8797.7188
6.6639	6.2069	90	8.9803	7945.4014
6.3749	6.8966	100	8.6494	5706.8130
6.0702	7.5862	110	8.5696	5268.9268
5.9107	8.2759	120	8.3612	4277.6265
5.6724	8.9655	130	8.4294	4579.6484
5.5949	9.6552	140	8.4934	4882.4316
5.4904	10.3448	150	8.4683	4761.3862
5.3792	11.0345	160	8.4647	4744.5381
5.3091	11.7241	170	8.5767	5306.3535
5.233	12.4138	180	8.5257	5042.5068
5.2252	13.1034	190	8.5328	5078.8433
5.1445	13.7931	200	8.5871	5361.9390
5.0824	14.4828	210	8.5784	5315.4043
5.0272	15.1724	220	8.6434	5672.6934
4.979	15.8621	230	8.6836	5905.4277
4.924	16.5517	240	8.7112	6070.2261
4.9394	17.2414	250	8.7233	6144.3931
4.8663	17.9310	260	8.7411	6254.5234
4.8599	18.6207	270	8.7824	6518.7896
4.8572	19.3103	280	8.8338	6862.5586
4.8064	20.0	290	8.7774	6485.7441
4.746	20.6897	300	8.8458	6944.8892
4.7569	21.3793	310	8.8436	6930.1416
4.6954	22.0690	320	8.8618	7057.1084
4.7277	22.7586	330	8.8706	7119.4478
4.6432	23.4483	340	8.9084	7393.6138
4.6032	24.1379	350	8.9111	7413.5176
4.6198	24.8276	360	8.9526	7728.0210
4.5874	25.5172	370	8.9740	7895.1641
4.5455	26.2069	380	8.9365	7604.7129
4.5313	26.8966	390	8.9738	7893.2969
4.5297	27.5862	400	8.9659	7831.8110
4.5279	28.2759	410	8.9914	8034.0391
4.4974	28.9655	420	9.0293	8344.2529
4.4554	29.6552	430	9.0191	8259.1533
4.4651	30.3448	440	9.0236	8296.4531
4.4647	31.0345	450	9.0349	8391.1279
4.4668	31.7241	460	9.0530	8543.8340
4.4264	32.4138	470	9.0722	8709.4141
4.4008	33.1034	480	9.0876	8844.6104
4.3982	33.7931	490	9.0711	8700.4893
4.3846	34.4828	500	9.0894	8860.7441
4.3971	35.1724	510	9.0879	8847.6973
4.379	35.8621	520	9.0949	8909.6025
4.3696	36.5517	530	9.1097	9042.2295
4.3447	37.2414	540	9.1007	8961.6953
4.3796	37.9310	550	9.0869	8839.0781
4.364	38.6207	560	9.0987	8943.6289

Framework versions

Transformers 4.40.1
Pytorch 2.3.0+cu121
Datasets 2.19.0
Tokenizers 0.19.1

Kasdeja23
/

GPT2WaP

GPT2WaP

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Kasdeja23/GPT2WaP

Evaluation results