gemma-2-2b-it-bs-2-lr-5e-05-ep-3-wp-0.1-gacc-16-gnm-1.0-FP16-mx-2048-v2.3

This model is a fine-tuned version of google/gemma-2-2b-it on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6431
  • Bleu: 14.6503
  • Chrf: 32.9918
  • Ter: 84.9561
  • Gen Len: 1.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Ter Gen Len
1.6305 0.0289 20 1.6468 4.8799 18.9039 154.0306 1.0
1.2481 0.0578 40 0.6159 5.9161 20.616 94.795 1.0
1.1889 0.0867 60 0.5880 6.3486 21.6392 95.4402 1.0
1.0469 0.1157 80 0.5815 7.1401 22.4082 92.8465 1.0
1.1488 0.1446 100 0.5766 7.2559 24.0665 93.819 1.0
1.1192 0.1735 120 0.5579 8.1656 24.6362 91.974 1.0
1.2241 0.2024 140 0.5569 8.5328 25.2912 92.9546 1.0
1.1121 0.2313 160 0.5570 8.2855 24.9762 91.7013 1.0
1.1824 0.2602 180 0.5574 7.9792 24.5218 90.7155 1.0
1.1207 0.2891 200 0.5591 8.5123 24.9312 90.7648 1.0
1.1408 0.3181 220 0.5651 9.263 25.8002 90.2563 1.0
1.1243 0.3470 240 0.5532 9.1717 24.5861 89.8496 1.0
1.1997 0.3759 260 0.5472 9.7144 26.5564 88.2731 1.0
1.2488 0.4048 280 0.5470 9.8966 26.6442 89.7022 1.0
1.2589 0.4337 300 0.5498 10.0924 26.5891 88.7918 1.0
1.1056 0.4626 320 0.5419 10.852 26.8782 88.4185 1.0
1.1813 0.4916 340 0.5414 11.7649 27.9435 87.1194 1.0
1.1506 0.5205 360 0.5326 11.125 28.5236 86.8577 1.0
1.0654 0.5494 380 0.5303 11.4262 28.5943 88.5115 1.0
1.1014 0.5783 400 0.5307 10.66 28.1805 88.1961 1.0
1.1259 0.6072 420 0.5356 10.6158 27.8987 99.1666 1.0
1.0986 0.6361 440 0.5291 12.0846 28.7401 87.9881 1.0
1.1226 0.6650 460 0.5338 12.0306 28.6747 86.7028 1.0
1.0617 0.6940 480 0.5278 12.736 29.7926 85.4065 1.0
1.0705 0.7229 500 0.5371 12.1319 29.6755 87.3459 1.0
1.0779 0.7518 520 0.5308 12.3201 29.5072 85.42 1.0
1.1431 0.7807 540 0.5236 12.0074 30.2736 86.5512 1.0
1.161 0.8096 560 0.5211 11.7915 29.3134 86.8655 1.0
1.0006 0.8385 580 0.5207 11.8458 29.334 87.0373 1.0
1.0935 0.8674 600 0.5227 11.9221 30.197 84.8232 1.0
1.0323 0.8964 620 0.5187 11.3342 29.6238 87.1094 1.0
1.0505 0.9253 640 0.5144 12.1703 30.24 85.4506 1.0
1.0925 0.9542 660 0.5182 11.8802 29.3723 85.3179 1.0
1.1714 0.9831 680 0.5183 12.985 30.9415 84.9945 1.0
0.6904 1.0130 700 0.5487 13.6607 31.9255 83.6157 1.0
0.745 1.0419 720 0.5377 13.9662 31.646 84.1042 1.0
0.7148 1.0708 740 0.5452 13.2872 31.5421 83.9215 1.0
0.643 1.0998 760 0.5499 13.63 31.8782 84.903 1.0
0.6744 1.1287 780 0.5575 14.2596 32.304 84.681 1.0
0.7373 1.1576 800 0.5456 13.5862 31.8253 84.393 1.0
0.6757 1.1865 820 0.5403 14.0562 32.1402 84.7114 1.0
0.6486 1.2154 840 0.5475 13.5419 32.0602 85.0341 1.0
0.708 1.2443 860 0.5417 13.4943 31.5649 84.5704 1.0
0.7236 1.2732 880 0.5395 12.9227 31.2859 85.3346 1.0
0.7797 1.3022 900 0.5476 13.436 31.7015 84.6371 1.0
0.6754 1.3311 920 0.5375 13.2487 31.5434 84.021 1.0
0.6798 1.3600 940 0.5428 14.0067 32.8167 83.7605 1.0
0.6646 1.3889 960 0.5398 14.19 32.5846 84.297 1.0
0.6552 1.4178 980 0.5380 14.2965 33.2382 83.9637 1.0
0.6775 1.4467 1000 0.5430 14.1695 32.9092 84.0506 1.0
0.684 1.4756 1020 0.5372 14.7039 33.3028 83.5982 1.0
0.6972 1.5046 1040 0.5379 14.3924 32.8448 83.1456 1.0
0.6508 1.5335 1060 0.5352 14.583 33.067 83.287 1.0
0.632 1.5624 1080 0.5468 14.7325 33.3281 83.8706 1.0
0.727 1.5913 1100 0.5331 14.3633 32.9322 83.4413 1.0
0.7348 1.6202 1120 0.5333 14.2374 32.6143 82.9146 1.0
0.6656 1.6491 1140 0.5429 14.201 32.6961 83.9726 1.0
0.6256 1.6781 1160 0.5388 14.5453 33.0475 83.4882 1.0
0.6924 1.7070 1180 0.5365 14.6304 33.1562 83.2688 1.0
0.6858 1.7359 1200 0.5393 15.0899 33.5447 82.996 1.0
0.5881 1.7648 1220 0.5392 14.667 33.3438 83.1356 1.0
0.5651 1.7937 1240 0.5402 15.4378 33.6541 81.7708 1.0
0.7284 1.8226 1260 0.5346 14.8227 33.4247 83.4535 1.0
0.6306 1.8515 1280 0.5319 14.8269 33.6046 83.1934 1.0
0.6572 1.8805 1300 0.5320 15.1053 34.0897 82.3907 1.0
0.6535 1.9094 1320 0.5357 14.8839 33.4102 82.4797 1.0
0.6261 1.9383 1340 0.5380 14.9109 33.6909 83.3605 1.0
0.6464 1.9672 1360 0.5285 14.9013 33.5382 83.2153 1.0
0.6824 1.9961 1380 0.5267 15.0163 33.6645 82.6823 1.0
0.2613 2.0260 1400 0.6124 14.48 33.1279 84.4074 1.0
0.3124 2.0549 1420 0.6413 14.1054 32.5095 85.0131 1.0
0.3049 2.0839 1440 0.6374 14.2445 32.4733 85.507 1.0
0.2714 2.1128 1460 0.6361 14.2699 32.3035 85.755 1.0
0.3301 2.1417 1480 0.6309 13.8199 32.526 85.4411 1.0
0.2641 2.1706 1500 0.6380 14.3405 32.8969 85.4122 1.0
0.262 2.1995 1520 0.6387 14.2671 33.0908 85.1801 1.0
0.2673 2.2284 1540 0.6384 14.396 32.982 85.1313 1.0
0.3174 2.2573 1560 0.6401 14.5154 33.0967 85.131 1.0
0.26 2.2863 1580 0.6357 14.393 33.2324 85.1789 1.0
0.2839 2.3152 1600 0.6403 14.4803 32.8922 85.4798 1.0
0.2653 2.3441 1620 0.6384 14.7389 33.293 84.9899 1.0
0.35 2.3730 1640 0.6381 14.598 33.0187 84.9201 1.0
0.3045 2.4019 1660 0.6416 14.458 32.6372 85.1604 1.0
0.2818 2.4308 1680 0.6440 14.5244 32.856 85.571 1.0
0.2652 2.4597 1700 0.6450 14.6749 32.9037 85.3041 1.0
0.2693 2.4887 1720 0.6454 14.5879 32.8362 85.3871 1.0
0.3171 2.5176 1740 0.6439 14.7316 32.953 85.1034 1.0
0.2609 2.5465 1760 0.6454 14.4935 32.9601 84.997 1.0
0.2822 2.5754 1780 0.6446 14.3962 32.9424 85.2155 1.0
0.3131 2.6043 1800 0.6427 14.4512 32.8568 85.0961 1.0
0.2947 2.6332 1820 0.6434 14.372 32.7709 85.1213 1.0
0.2673 2.6621 1840 0.6432 14.4881 32.8784 85.1516 1.0
0.2909 2.6911 1860 0.6423 14.6748 33.1419 85.0915 1.0
0.2783 2.7200 1880 0.6416 14.8233 33.1044 85.0892 1.0
0.2306 2.7489 1900 0.6416 14.9317 33.182 84.7472 1.0
0.2913 2.7778 1920 0.6431 14.8385 33.1646 84.9655 1.0
0.322 2.8067 1940 0.6429 14.7481 33.1404 84.955 1.0
0.2826 2.8356 1960 0.6421 14.7345 33.0948 84.8726 1.0
0.2529 2.8646 1980 0.6424 14.721 33.0926 84.9997 1.0
0.352 2.8935 2000 0.6426 14.6022 32.9792 85.1056 1.0
0.2622 2.9224 2020 0.6425 14.8069 33.1242 84.8546 1.0
0.3045 2.9513 2040 0.6432 14.7134 33.0836 84.9625 1.0
0.2893 2.9802 2060 0.6431 14.6503 32.9918 84.9561 1.0

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 2.21.0
  • Tokenizers 0.21.0
Downloads last month
18
Safetensors
Model size
2.61B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for BounharAbdelaziz/gemma-2-2b-it-bs-2-lr-5e-05-ep-3-wp-0.1-gacc-16-gnm-1.0-FP16-mx-2048-v2.3

Base model

google/gemma-2-2b
Finetuned
(552)
this model