deneme_linux / README.md
denizzhansahin's picture
Upload model
4cec6ec verified
|
raw
history blame
7.82 kB
metadata
license: mit
base_model: gpt2
tags:
  - generated_from_keras_callback
model-index:
  - name: deneme_linux
    results: []

deneme_linux

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Train Loss: 2.7996
  • Validation Loss: 7.3305
  • Epoch: 149

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'module': 'transformers.optimization_tf', 'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'module': 'keras.optimizers.schedules', 'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': -995, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'registered_name': None}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}, 'registered_name': 'WarmUp'}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
  • training_precision: float32

Training results

Train Loss Validation Loss Epoch
9.3341 9.3329 0
9.3273 9.3261 1
9.3176 9.3147 2
9.3034 9.2988 3
9.2891 9.2790 4
9.2696 9.2563 5
9.2427 9.2321 6
9.2079 9.2076 7
9.1797 9.1831 8
9.1538 9.1594 9
9.1238 9.1368 10
9.0976 9.1149 11
9.0693 9.0941 12
9.0416 9.0740 13
9.0129 9.0552 14
8.9912 9.0376 15
8.9680 9.0204 16
8.9454 9.0037 17
8.9195 8.9877 18
8.8975 8.9721 19
8.8795 8.9572 20
8.8502 8.9428 21
8.8225 8.9281 22
8.8015 8.9138 23
8.7767 8.9003 24
8.7509 8.8865 25
8.7220 8.8734 26
8.6941 8.8605 27
8.6681 8.8465 28
8.6301 8.8336 29
8.5992 8.8200 30
8.5714 8.8052 31
8.5383 8.7926 32
8.5024 8.7789 33
8.4610 8.7636 34
8.4281 8.7503 35
8.3899 8.7361 36
8.3533 8.7230 37
8.3132 8.7070 38
8.2752 8.6910 39
8.2345 8.6810 40
8.1960 8.6648 41
8.1543 8.6492 42
8.1172 8.6380 43
8.0813 8.6207 44
8.0300 8.6091 45
7.9933 8.5904 46
7.9482 8.5793 47
7.9128 8.5605 48
7.8651 8.5490 49
7.8304 8.5362 50
7.7855 8.5210 51
7.7519 8.5072 52
7.7060 8.4953 53
7.6608 8.4803 54
7.6056 8.4718 55
7.5630 8.4561 56
7.5407 8.4417 57
7.4962 8.4266 58
7.4505 8.4215 59
7.4109 8.3973 60
7.3746 8.3906 61
7.3244 8.3758 62
7.2809 8.3652 63
7.2430 8.3495 64
7.1911 8.3423 65
7.1611 8.3227 66
7.1075 8.3119 67
7.0734 8.3032 68
7.0258 8.2899 69
6.9824 8.2817 70
6.9412 8.2611 71
6.8944 8.2550 72
6.8464 8.2429 73
6.8119 8.2240 74
6.7580 8.2199 75
6.7163 8.2044 76
6.6795 8.1819 77
6.6326 8.1847 78
6.5853 8.1733 79
6.5533 8.1524 80
6.4894 8.1398 81
6.4450 8.1347 82
6.3933 8.1220 83
6.3410 8.1031 84
6.3249 8.0906 85
6.2508 8.0915 86
6.2044 8.0682 87
6.1633 8.0565 88
6.1228 8.0491 89
6.0807 8.0392 90
6.0308 8.0189 91
5.9657 8.0094 92
5.9309 7.9979 93
5.8735 7.9804 94
5.8191 7.9702 95
5.7671 7.9677 96
5.7181 7.9494 97
5.6724 7.9402 98
5.6309 7.9209 99
5.5713 7.9112 100
5.5281 7.8977 101
5.4531 7.8884 102
5.4251 7.8717 103
5.3797 7.8637 104
5.3067 7.8538 105
5.2699 7.8436 106
5.2156 7.8301 107
5.1551 7.8185 108
5.1223 7.8017 109
5.0656 7.7927 110
4.9996 7.7754 111
4.9432 7.7580 112
4.9028 7.7489 113
4.8242 7.7411 114
4.7516 7.7196 115
4.7323 7.7101 116
4.6725 7.7042 117
4.6302 7.6833 118
4.5391 7.6679 119
4.5007 7.6575 120
4.4435 7.6530 121
4.3905 7.6396 122
4.3257 7.6236 123
4.2915 7.6106 124
4.1985 7.5916 125
4.1590 7.5937 126
4.1070 7.5777 127
4.0532 7.5640 128
3.9899 7.5493 129
3.9289 7.5384 130
3.8696 7.5265 131
3.7945 7.5198 132
3.7454 7.5054 133
3.6815 7.4894 134
3.6453 7.4796 135
3.5649 7.4746 136
3.5214 7.4608 137
3.4517 7.4473 138
3.3937 7.4363 139
3.3266 7.4263 140
3.2744 7.4128 141
3.2199 7.3996 142
3.1601 7.3887 143
3.0998 7.3737 144
3.0584 7.3648 145
2.9785 7.3565 146
2.9186 7.3513 147
2.8455 7.3410 148
2.7996 7.3305 149

Framework versions

  • Transformers 4.38.2
  • TensorFlow 2.15.0
  • Datasets 2.18.0
  • Tokenizers 0.15.2