metadata
license: mit
base_model: gpt2
tags:
- generated_from_keras_callback
model-index:
- name: deneme_linux
results: []
deneme_linux
This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:
- Train Loss: 2.7996
- Validation Loss: 7.3305
- Epoch: 149
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'module': 'transformers.optimization_tf', 'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'module': 'keras.optimizers.schedules', 'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': -995, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'registered_name': None}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}, 'registered_name': 'WarmUp'}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
- training_precision: float32
Training results
Train Loss | Validation Loss | Epoch |
---|---|---|
9.3341 | 9.3329 | 0 |
9.3273 | 9.3261 | 1 |
9.3176 | 9.3147 | 2 |
9.3034 | 9.2988 | 3 |
9.2891 | 9.2790 | 4 |
9.2696 | 9.2563 | 5 |
9.2427 | 9.2321 | 6 |
9.2079 | 9.2076 | 7 |
9.1797 | 9.1831 | 8 |
9.1538 | 9.1594 | 9 |
9.1238 | 9.1368 | 10 |
9.0976 | 9.1149 | 11 |
9.0693 | 9.0941 | 12 |
9.0416 | 9.0740 | 13 |
9.0129 | 9.0552 | 14 |
8.9912 | 9.0376 | 15 |
8.9680 | 9.0204 | 16 |
8.9454 | 9.0037 | 17 |
8.9195 | 8.9877 | 18 |
8.8975 | 8.9721 | 19 |
8.8795 | 8.9572 | 20 |
8.8502 | 8.9428 | 21 |
8.8225 | 8.9281 | 22 |
8.8015 | 8.9138 | 23 |
8.7767 | 8.9003 | 24 |
8.7509 | 8.8865 | 25 |
8.7220 | 8.8734 | 26 |
8.6941 | 8.8605 | 27 |
8.6681 | 8.8465 | 28 |
8.6301 | 8.8336 | 29 |
8.5992 | 8.8200 | 30 |
8.5714 | 8.8052 | 31 |
8.5383 | 8.7926 | 32 |
8.5024 | 8.7789 | 33 |
8.4610 | 8.7636 | 34 |
8.4281 | 8.7503 | 35 |
8.3899 | 8.7361 | 36 |
8.3533 | 8.7230 | 37 |
8.3132 | 8.7070 | 38 |
8.2752 | 8.6910 | 39 |
8.2345 | 8.6810 | 40 |
8.1960 | 8.6648 | 41 |
8.1543 | 8.6492 | 42 |
8.1172 | 8.6380 | 43 |
8.0813 | 8.6207 | 44 |
8.0300 | 8.6091 | 45 |
7.9933 | 8.5904 | 46 |
7.9482 | 8.5793 | 47 |
7.9128 | 8.5605 | 48 |
7.8651 | 8.5490 | 49 |
7.8304 | 8.5362 | 50 |
7.7855 | 8.5210 | 51 |
7.7519 | 8.5072 | 52 |
7.7060 | 8.4953 | 53 |
7.6608 | 8.4803 | 54 |
7.6056 | 8.4718 | 55 |
7.5630 | 8.4561 | 56 |
7.5407 | 8.4417 | 57 |
7.4962 | 8.4266 | 58 |
7.4505 | 8.4215 | 59 |
7.4109 | 8.3973 | 60 |
7.3746 | 8.3906 | 61 |
7.3244 | 8.3758 | 62 |
7.2809 | 8.3652 | 63 |
7.2430 | 8.3495 | 64 |
7.1911 | 8.3423 | 65 |
7.1611 | 8.3227 | 66 |
7.1075 | 8.3119 | 67 |
7.0734 | 8.3032 | 68 |
7.0258 | 8.2899 | 69 |
6.9824 | 8.2817 | 70 |
6.9412 | 8.2611 | 71 |
6.8944 | 8.2550 | 72 |
6.8464 | 8.2429 | 73 |
6.8119 | 8.2240 | 74 |
6.7580 | 8.2199 | 75 |
6.7163 | 8.2044 | 76 |
6.6795 | 8.1819 | 77 |
6.6326 | 8.1847 | 78 |
6.5853 | 8.1733 | 79 |
6.5533 | 8.1524 | 80 |
6.4894 | 8.1398 | 81 |
6.4450 | 8.1347 | 82 |
6.3933 | 8.1220 | 83 |
6.3410 | 8.1031 | 84 |
6.3249 | 8.0906 | 85 |
6.2508 | 8.0915 | 86 |
6.2044 | 8.0682 | 87 |
6.1633 | 8.0565 | 88 |
6.1228 | 8.0491 | 89 |
6.0807 | 8.0392 | 90 |
6.0308 | 8.0189 | 91 |
5.9657 | 8.0094 | 92 |
5.9309 | 7.9979 | 93 |
5.8735 | 7.9804 | 94 |
5.8191 | 7.9702 | 95 |
5.7671 | 7.9677 | 96 |
5.7181 | 7.9494 | 97 |
5.6724 | 7.9402 | 98 |
5.6309 | 7.9209 | 99 |
5.5713 | 7.9112 | 100 |
5.5281 | 7.8977 | 101 |
5.4531 | 7.8884 | 102 |
5.4251 | 7.8717 | 103 |
5.3797 | 7.8637 | 104 |
5.3067 | 7.8538 | 105 |
5.2699 | 7.8436 | 106 |
5.2156 | 7.8301 | 107 |
5.1551 | 7.8185 | 108 |
5.1223 | 7.8017 | 109 |
5.0656 | 7.7927 | 110 |
4.9996 | 7.7754 | 111 |
4.9432 | 7.7580 | 112 |
4.9028 | 7.7489 | 113 |
4.8242 | 7.7411 | 114 |
4.7516 | 7.7196 | 115 |
4.7323 | 7.7101 | 116 |
4.6725 | 7.7042 | 117 |
4.6302 | 7.6833 | 118 |
4.5391 | 7.6679 | 119 |
4.5007 | 7.6575 | 120 |
4.4435 | 7.6530 | 121 |
4.3905 | 7.6396 | 122 |
4.3257 | 7.6236 | 123 |
4.2915 | 7.6106 | 124 |
4.1985 | 7.5916 | 125 |
4.1590 | 7.5937 | 126 |
4.1070 | 7.5777 | 127 |
4.0532 | 7.5640 | 128 |
3.9899 | 7.5493 | 129 |
3.9289 | 7.5384 | 130 |
3.8696 | 7.5265 | 131 |
3.7945 | 7.5198 | 132 |
3.7454 | 7.5054 | 133 |
3.6815 | 7.4894 | 134 |
3.6453 | 7.4796 | 135 |
3.5649 | 7.4746 | 136 |
3.5214 | 7.4608 | 137 |
3.4517 | 7.4473 | 138 |
3.3937 | 7.4363 | 139 |
3.3266 | 7.4263 | 140 |
3.2744 | 7.4128 | 141 |
3.2199 | 7.3996 | 142 |
3.1601 | 7.3887 | 143 |
3.0998 | 7.3737 | 144 |
3.0584 | 7.3648 | 145 |
2.9785 | 7.3565 | 146 |
2.9186 | 7.3513 | 147 |
2.8455 | 7.3410 | 148 |
2.7996 | 7.3305 | 149 |
Framework versions
- Transformers 4.38.2
- TensorFlow 2.15.0
- Datasets 2.18.0
- Tokenizers 0.15.2