train_stsb_1745333589

This model is a fine-tuned version of google/gemma-3-1b-it on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3004
  • Num Input Tokens Seen: 61089232

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
3.4322 0.6182 200 3.3989 305312
1.8164 1.2349 400 1.7951 610048
1.4417 1.8532 600 1.4198 917664
1.2241 2.4699 800 1.2246 1223104
1.0684 3.0866 1000 1.0839 1528432
1.0292 3.7048 1200 0.9670 1837520
0.8334 4.3215 1400 0.8561 2143216
0.7228 4.9397 1600 0.7671 2448176
0.6282 5.5564 1800 0.6984 2752768
0.639 6.1731 2000 0.6433 3059504
0.5502 6.7913 2200 0.5964 3364688
0.5518 7.4080 2400 0.5614 3672432
0.4631 8.0247 2600 0.5373 3978272
0.4895 8.6430 2800 0.5146 4285856
0.4762 9.2597 3000 0.4976 4588608
0.4262 9.8779 3200 0.4827 4894432
0.4203 10.4946 3400 0.4712 5200528
0.3634 11.1113 3600 0.4612 5504960
0.3689 11.7295 3800 0.4505 5808800
0.3246 12.3462 4000 0.4438 6114608
0.4121 12.9645 4200 0.4352 6419376
0.377 13.5811 4400 0.4290 6725664
0.3873 14.1978 4600 0.4208 7030208
0.4336 14.8161 4800 0.4164 7335712
0.2933 15.4328 5000 0.4105 7641232
0.3062 16.0495 5200 0.4053 7945360
0.2797 16.6677 5400 0.4014 8252048
0.3236 17.2844 5600 0.3953 8557024
0.365 17.9026 5800 0.3897 8862080
0.3191 18.5193 6000 0.3872 9167248
0.2915 19.1360 6200 0.3811 9472816
0.3482 19.7543 6400 0.3782 9779344
0.2646 20.3709 6600 0.3745 10085888
0.2759 20.9892 6800 0.3703 10391904
0.3038 21.6059 7000 0.3680 10697664
0.313 22.2226 7200 0.3669 11000832
0.2537 22.8408 7400 0.3650 11308384
0.2978 23.4575 7600 0.3594 11614048
0.3065 24.0742 7800 0.3579 11917328
0.2827 24.6924 8000 0.3559 12224848
0.258 25.3091 8200 0.3545 12530128
0.2731 25.9274 8400 0.3516 12838032
0.3197 26.5440 8600 0.3514 13142096
0.2863 27.1607 8800 0.3498 13447712
0.3435 27.7790 9000 0.3483 13751968
0.255 28.3957 9200 0.3486 14060176
0.294 29.0124 9400 0.3449 14362928
0.2661 29.6306 9600 0.3429 14669168
0.3367 30.2473 9800 0.3407 14973568
0.2961 30.8655 10000 0.3408 15279840
0.2853 31.4822 10200 0.3395 15586352
0.2809 32.0989 10400 0.3365 15891232
0.2892 32.7172 10600 0.3371 16197472
0.2193 33.3338 10800 0.3341 16500992
0.2344 33.9521 11000 0.3343 16807808
0.2338 34.5688 11200 0.3325 17112928
0.2304 35.1855 11400 0.3333 17420016
0.3045 35.8037 11600 0.3309 17726608
0.2831 36.4204 11800 0.3292 18030288
0.2686 37.0371 12000 0.3293 18337584
0.2861 37.6553 12200 0.3285 18640720
0.263 38.2720 12400 0.3284 18946400
0.2913 38.8903 12600 0.3276 19254240
0.273 39.5070 12800 0.3253 19558592
0.2321 40.1236 13000 0.3258 19861168
0.2927 40.7419 13200 0.3250 20169712
0.2533 41.3586 13400 0.3245 20475008
0.2798 41.9768 13600 0.3234 20782016
0.2724 42.5935 13800 0.3227 21085440
0.2989 43.2102 14000 0.3217 21391616
0.2997 43.8284 14200 0.3211 21696768
0.2512 44.4451 14400 0.3206 22001488
0.2577 45.0618 14600 0.3228 22307216
0.256 45.6801 14800 0.3205 22612016
0.2296 46.2968 15000 0.3194 22917744
0.2335 46.9150 15200 0.3181 23224720
0.2468 47.5317 15400 0.3173 23531040
0.2409 48.1484 15600 0.3178 23836048
0.2948 48.7666 15800 0.3170 24140240
0.29 49.3833 16000 0.3168 24445104
0.2291 50.0 16200 0.3159 24750256
0.2685 50.6182 16400 0.3161 25055056
0.2823 51.2349 16600 0.3141 25360976
0.2348 51.8532 16800 0.3148 25669136
0.2632 52.4699 17000 0.3150 25972400
0.2542 53.0866 17200 0.3144 26280272
0.2637 53.7048 17400 0.3134 26583184
0.2026 54.3215 17600 0.3146 26891152
0.2113 54.9397 17800 0.3138 27197008
0.3108 55.5564 18000 0.3130 27500160
0.2994 56.1731 18200 0.3103 27805616
0.317 56.7913 18400 0.3123 28112336
0.2248 57.4080 18600 0.3116 28419888
0.2551 58.0247 18800 0.3115 28724096
0.2405 58.6430 19000 0.3120 29031328
0.2108 59.2597 19200 0.3110 29336560
0.2993 59.8779 19400 0.3109 29642224
0.2417 60.4946 19600 0.3098 29947456
0.3663 61.1113 19800 0.3119 30252288
0.2843 61.7295 20000 0.3084 30557408
0.2279 62.3462 20200 0.3108 30862656
0.2419 62.9645 20400 0.3091 31169472
0.2191 63.5811 20600 0.3100 31474928
0.2659 64.1978 20800 0.3088 31778496
0.237 64.8161 21000 0.3085 32086304
0.3355 65.4328 21200 0.3081 32389328
0.3007 66.0495 21400 0.3080 32696656
0.2563 66.6677 21600 0.3079 33001008
0.233 67.2844 21800 0.3067 33306288
0.3712 67.9026 22000 0.3072 33612592
0.24 68.5193 22200 0.3070 33914992
0.3087 69.1360 22400 0.3068 34219808
0.2091 69.7543 22600 0.3067 34525536
0.2151 70.3709 22800 0.3072 34829856
0.2072 70.9892 23000 0.3063 35134560
0.2173 71.6059 23200 0.3068 35439168
0.3593 72.2226 23400 0.3070 35744608
0.2151 72.8408 23600 0.3069 36050688
0.2393 73.4575 23800 0.3055 36353808
0.2728 74.0742 24000 0.3064 36660560
0.2662 74.6924 24200 0.3075 36968464
0.2926 75.3091 24400 0.3049 37273264
0.2 75.9274 24600 0.3047 37578896
0.2329 76.5440 24800 0.3066 37882832
0.2172 77.1607 25000 0.3037 38187312
0.2493 77.7790 25200 0.3038 38492720
0.332 78.3957 25400 0.3039 38796864
0.2746 79.0124 25600 0.3050 39103824
0.2415 79.6306 25800 0.3037 39410448
0.3165 80.2473 26000 0.3044 39715280
0.2209 80.8655 26200 0.3050 40021520
0.2747 81.4822 26400 0.3036 40325376
0.267 82.0989 26600 0.3046 40631296
0.2835 82.7172 26800 0.3045 40937312
0.2254 83.3338 27000 0.3038 41240464
0.2325 83.9521 27200 0.3017 41550128
0.2357 84.5688 27400 0.3032 41855152
0.2491 85.1855 27600 0.3025 42158912
0.2423 85.8037 27800 0.3035 42461856
0.2651 86.4204 28000 0.3035 42769760
0.2752 87.0371 28200 0.3036 43074800
0.25 87.6553 28400 0.3036 43378640
0.2937 88.2720 28600 0.3020 43683840
0.2567 88.8903 28800 0.3020 43988256
0.2193 89.5070 29000 0.3031 44294256
0.216 90.1236 29200 0.3020 44598464
0.2053 90.7419 29400 0.3032 44904928
0.1961 91.3586 29600 0.3039 45208784
0.2677 91.9768 29800 0.3022 45516336
0.2594 92.5935 30000 0.3024 45820432
0.2166 93.2102 30200 0.3032 46127408
0.2801 93.8284 30400 0.3027 46431888
0.2391 94.4451 30600 0.3027 46736368
0.2193 95.0618 30800 0.3018 47043472
0.2231 95.6801 31000 0.3024 47348976
0.271 96.2968 31200 0.3021 47652864
0.2043 96.9150 31400 0.3029 47959872
0.2605 97.5317 31600 0.3008 48265392
0.2414 98.1484 31800 0.3015 48569984
0.2337 98.7666 32000 0.3013 48874016
0.2581 99.3833 32200 0.3023 49181056
0.2602 100.0 32400 0.3036 49485120
0.2153 100.6182 32600 0.3017 49790304
0.2732 101.2349 32800 0.3024 50097008
0.2764 101.8532 33000 0.3016 50403088
0.2199 102.4699 33200 0.3029 50707088
0.2872 103.0866 33400 0.3009 51010144
0.2565 103.7048 33600 0.3013 51318976
0.2709 104.3215 33800 0.3021 51623136
0.2471 104.9397 34000 0.3022 51930240
0.2691 105.5564 34200 0.3021 52233888
0.2033 106.1731 34400 0.3004 52541008
0.243 106.7913 34600 0.3016 52845904
0.1853 107.4080 34800 0.3026 53150720
0.2903 108.0247 35000 0.3013 53456816
0.2021 108.6430 35200 0.3010 53760816
0.2481 109.2597 35400 0.3016 54066160
0.2505 109.8779 35600 0.3018 54371888
0.225 110.4946 35800 0.3015 54676672
0.2609 111.1113 36000 0.3015 54983008
0.2401 111.7295 36200 0.3015 55289472
0.2103 112.3462 36400 0.3021 55591632
0.2725 112.9645 36600 0.3016 55898640
0.257 113.5811 36800 0.3025 56202288
0.2189 114.1978 37000 0.3018 56510384
0.2521 114.8161 37200 0.3011 56816880
0.2003 115.4328 37400 0.3017 57119232
0.2821 116.0495 37600 0.3021 57424224
0.2208 116.6677 37800 0.3016 57729856
0.215 117.2844 38000 0.3015 58034352
0.2325 117.9026 38200 0.3020 58342576
0.2558 118.5193 38400 0.3031 58648384
0.2495 119.1360 38600 0.3023 58953568
0.2401 119.7543 38800 0.3019 59257088
0.2171 120.3709 39000 0.3018 59562208
0.2523 120.9892 39200 0.3015 59867712
0.3036 121.6059 39400 0.3016 60173616
0.2435 122.2226 39600 0.3023 60476592
0.2313 122.8408 39800 0.3018 60782960
0.2238 123.4575 40000 0.3018 61089232

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_1745333589

Adapter
(138)
this model