gemma-2-2b-it-bs-2-lr-5e-05-ep-3-wp-0.1-gacc-16-gnm-1.0-FP16-mx-2048-v2.3
This model is a fine-tuned version of google/gemma-2-2b-it on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6431
- Bleu: 14.6503
- Chrf: 32.9918
- Ter: 84.9561
- Gen Len: 1.0
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Chrf | Ter | Gen Len |
---|---|---|---|---|---|---|---|
1.6305 | 0.0289 | 20 | 1.6468 | 4.8799 | 18.9039 | 154.0306 | 1.0 |
1.2481 | 0.0578 | 40 | 0.6159 | 5.9161 | 20.616 | 94.795 | 1.0 |
1.1889 | 0.0867 | 60 | 0.5880 | 6.3486 | 21.6392 | 95.4402 | 1.0 |
1.0469 | 0.1157 | 80 | 0.5815 | 7.1401 | 22.4082 | 92.8465 | 1.0 |
1.1488 | 0.1446 | 100 | 0.5766 | 7.2559 | 24.0665 | 93.819 | 1.0 |
1.1192 | 0.1735 | 120 | 0.5579 | 8.1656 | 24.6362 | 91.974 | 1.0 |
1.2241 | 0.2024 | 140 | 0.5569 | 8.5328 | 25.2912 | 92.9546 | 1.0 |
1.1121 | 0.2313 | 160 | 0.5570 | 8.2855 | 24.9762 | 91.7013 | 1.0 |
1.1824 | 0.2602 | 180 | 0.5574 | 7.9792 | 24.5218 | 90.7155 | 1.0 |
1.1207 | 0.2891 | 200 | 0.5591 | 8.5123 | 24.9312 | 90.7648 | 1.0 |
1.1408 | 0.3181 | 220 | 0.5651 | 9.263 | 25.8002 | 90.2563 | 1.0 |
1.1243 | 0.3470 | 240 | 0.5532 | 9.1717 | 24.5861 | 89.8496 | 1.0 |
1.1997 | 0.3759 | 260 | 0.5472 | 9.7144 | 26.5564 | 88.2731 | 1.0 |
1.2488 | 0.4048 | 280 | 0.5470 | 9.8966 | 26.6442 | 89.7022 | 1.0 |
1.2589 | 0.4337 | 300 | 0.5498 | 10.0924 | 26.5891 | 88.7918 | 1.0 |
1.1056 | 0.4626 | 320 | 0.5419 | 10.852 | 26.8782 | 88.4185 | 1.0 |
1.1813 | 0.4916 | 340 | 0.5414 | 11.7649 | 27.9435 | 87.1194 | 1.0 |
1.1506 | 0.5205 | 360 | 0.5326 | 11.125 | 28.5236 | 86.8577 | 1.0 |
1.0654 | 0.5494 | 380 | 0.5303 | 11.4262 | 28.5943 | 88.5115 | 1.0 |
1.1014 | 0.5783 | 400 | 0.5307 | 10.66 | 28.1805 | 88.1961 | 1.0 |
1.1259 | 0.6072 | 420 | 0.5356 | 10.6158 | 27.8987 | 99.1666 | 1.0 |
1.0986 | 0.6361 | 440 | 0.5291 | 12.0846 | 28.7401 | 87.9881 | 1.0 |
1.1226 | 0.6650 | 460 | 0.5338 | 12.0306 | 28.6747 | 86.7028 | 1.0 |
1.0617 | 0.6940 | 480 | 0.5278 | 12.736 | 29.7926 | 85.4065 | 1.0 |
1.0705 | 0.7229 | 500 | 0.5371 | 12.1319 | 29.6755 | 87.3459 | 1.0 |
1.0779 | 0.7518 | 520 | 0.5308 | 12.3201 | 29.5072 | 85.42 | 1.0 |
1.1431 | 0.7807 | 540 | 0.5236 | 12.0074 | 30.2736 | 86.5512 | 1.0 |
1.161 | 0.8096 | 560 | 0.5211 | 11.7915 | 29.3134 | 86.8655 | 1.0 |
1.0006 | 0.8385 | 580 | 0.5207 | 11.8458 | 29.334 | 87.0373 | 1.0 |
1.0935 | 0.8674 | 600 | 0.5227 | 11.9221 | 30.197 | 84.8232 | 1.0 |
1.0323 | 0.8964 | 620 | 0.5187 | 11.3342 | 29.6238 | 87.1094 | 1.0 |
1.0505 | 0.9253 | 640 | 0.5144 | 12.1703 | 30.24 | 85.4506 | 1.0 |
1.0925 | 0.9542 | 660 | 0.5182 | 11.8802 | 29.3723 | 85.3179 | 1.0 |
1.1714 | 0.9831 | 680 | 0.5183 | 12.985 | 30.9415 | 84.9945 | 1.0 |
0.6904 | 1.0130 | 700 | 0.5487 | 13.6607 | 31.9255 | 83.6157 | 1.0 |
0.745 | 1.0419 | 720 | 0.5377 | 13.9662 | 31.646 | 84.1042 | 1.0 |
0.7148 | 1.0708 | 740 | 0.5452 | 13.2872 | 31.5421 | 83.9215 | 1.0 |
0.643 | 1.0998 | 760 | 0.5499 | 13.63 | 31.8782 | 84.903 | 1.0 |
0.6744 | 1.1287 | 780 | 0.5575 | 14.2596 | 32.304 | 84.681 | 1.0 |
0.7373 | 1.1576 | 800 | 0.5456 | 13.5862 | 31.8253 | 84.393 | 1.0 |
0.6757 | 1.1865 | 820 | 0.5403 | 14.0562 | 32.1402 | 84.7114 | 1.0 |
0.6486 | 1.2154 | 840 | 0.5475 | 13.5419 | 32.0602 | 85.0341 | 1.0 |
0.708 | 1.2443 | 860 | 0.5417 | 13.4943 | 31.5649 | 84.5704 | 1.0 |
0.7236 | 1.2732 | 880 | 0.5395 | 12.9227 | 31.2859 | 85.3346 | 1.0 |
0.7797 | 1.3022 | 900 | 0.5476 | 13.436 | 31.7015 | 84.6371 | 1.0 |
0.6754 | 1.3311 | 920 | 0.5375 | 13.2487 | 31.5434 | 84.021 | 1.0 |
0.6798 | 1.3600 | 940 | 0.5428 | 14.0067 | 32.8167 | 83.7605 | 1.0 |
0.6646 | 1.3889 | 960 | 0.5398 | 14.19 | 32.5846 | 84.297 | 1.0 |
0.6552 | 1.4178 | 980 | 0.5380 | 14.2965 | 33.2382 | 83.9637 | 1.0 |
0.6775 | 1.4467 | 1000 | 0.5430 | 14.1695 | 32.9092 | 84.0506 | 1.0 |
0.684 | 1.4756 | 1020 | 0.5372 | 14.7039 | 33.3028 | 83.5982 | 1.0 |
0.6972 | 1.5046 | 1040 | 0.5379 | 14.3924 | 32.8448 | 83.1456 | 1.0 |
0.6508 | 1.5335 | 1060 | 0.5352 | 14.583 | 33.067 | 83.287 | 1.0 |
0.632 | 1.5624 | 1080 | 0.5468 | 14.7325 | 33.3281 | 83.8706 | 1.0 |
0.727 | 1.5913 | 1100 | 0.5331 | 14.3633 | 32.9322 | 83.4413 | 1.0 |
0.7348 | 1.6202 | 1120 | 0.5333 | 14.2374 | 32.6143 | 82.9146 | 1.0 |
0.6656 | 1.6491 | 1140 | 0.5429 | 14.201 | 32.6961 | 83.9726 | 1.0 |
0.6256 | 1.6781 | 1160 | 0.5388 | 14.5453 | 33.0475 | 83.4882 | 1.0 |
0.6924 | 1.7070 | 1180 | 0.5365 | 14.6304 | 33.1562 | 83.2688 | 1.0 |
0.6858 | 1.7359 | 1200 | 0.5393 | 15.0899 | 33.5447 | 82.996 | 1.0 |
0.5881 | 1.7648 | 1220 | 0.5392 | 14.667 | 33.3438 | 83.1356 | 1.0 |
0.5651 | 1.7937 | 1240 | 0.5402 | 15.4378 | 33.6541 | 81.7708 | 1.0 |
0.7284 | 1.8226 | 1260 | 0.5346 | 14.8227 | 33.4247 | 83.4535 | 1.0 |
0.6306 | 1.8515 | 1280 | 0.5319 | 14.8269 | 33.6046 | 83.1934 | 1.0 |
0.6572 | 1.8805 | 1300 | 0.5320 | 15.1053 | 34.0897 | 82.3907 | 1.0 |
0.6535 | 1.9094 | 1320 | 0.5357 | 14.8839 | 33.4102 | 82.4797 | 1.0 |
0.6261 | 1.9383 | 1340 | 0.5380 | 14.9109 | 33.6909 | 83.3605 | 1.0 |
0.6464 | 1.9672 | 1360 | 0.5285 | 14.9013 | 33.5382 | 83.2153 | 1.0 |
0.6824 | 1.9961 | 1380 | 0.5267 | 15.0163 | 33.6645 | 82.6823 | 1.0 |
0.2613 | 2.0260 | 1400 | 0.6124 | 14.48 | 33.1279 | 84.4074 | 1.0 |
0.3124 | 2.0549 | 1420 | 0.6413 | 14.1054 | 32.5095 | 85.0131 | 1.0 |
0.3049 | 2.0839 | 1440 | 0.6374 | 14.2445 | 32.4733 | 85.507 | 1.0 |
0.2714 | 2.1128 | 1460 | 0.6361 | 14.2699 | 32.3035 | 85.755 | 1.0 |
0.3301 | 2.1417 | 1480 | 0.6309 | 13.8199 | 32.526 | 85.4411 | 1.0 |
0.2641 | 2.1706 | 1500 | 0.6380 | 14.3405 | 32.8969 | 85.4122 | 1.0 |
0.262 | 2.1995 | 1520 | 0.6387 | 14.2671 | 33.0908 | 85.1801 | 1.0 |
0.2673 | 2.2284 | 1540 | 0.6384 | 14.396 | 32.982 | 85.1313 | 1.0 |
0.3174 | 2.2573 | 1560 | 0.6401 | 14.5154 | 33.0967 | 85.131 | 1.0 |
0.26 | 2.2863 | 1580 | 0.6357 | 14.393 | 33.2324 | 85.1789 | 1.0 |
0.2839 | 2.3152 | 1600 | 0.6403 | 14.4803 | 32.8922 | 85.4798 | 1.0 |
0.2653 | 2.3441 | 1620 | 0.6384 | 14.7389 | 33.293 | 84.9899 | 1.0 |
0.35 | 2.3730 | 1640 | 0.6381 | 14.598 | 33.0187 | 84.9201 | 1.0 |
0.3045 | 2.4019 | 1660 | 0.6416 | 14.458 | 32.6372 | 85.1604 | 1.0 |
0.2818 | 2.4308 | 1680 | 0.6440 | 14.5244 | 32.856 | 85.571 | 1.0 |
0.2652 | 2.4597 | 1700 | 0.6450 | 14.6749 | 32.9037 | 85.3041 | 1.0 |
0.2693 | 2.4887 | 1720 | 0.6454 | 14.5879 | 32.8362 | 85.3871 | 1.0 |
0.3171 | 2.5176 | 1740 | 0.6439 | 14.7316 | 32.953 | 85.1034 | 1.0 |
0.2609 | 2.5465 | 1760 | 0.6454 | 14.4935 | 32.9601 | 84.997 | 1.0 |
0.2822 | 2.5754 | 1780 | 0.6446 | 14.3962 | 32.9424 | 85.2155 | 1.0 |
0.3131 | 2.6043 | 1800 | 0.6427 | 14.4512 | 32.8568 | 85.0961 | 1.0 |
0.2947 | 2.6332 | 1820 | 0.6434 | 14.372 | 32.7709 | 85.1213 | 1.0 |
0.2673 | 2.6621 | 1840 | 0.6432 | 14.4881 | 32.8784 | 85.1516 | 1.0 |
0.2909 | 2.6911 | 1860 | 0.6423 | 14.6748 | 33.1419 | 85.0915 | 1.0 |
0.2783 | 2.7200 | 1880 | 0.6416 | 14.8233 | 33.1044 | 85.0892 | 1.0 |
0.2306 | 2.7489 | 1900 | 0.6416 | 14.9317 | 33.182 | 84.7472 | 1.0 |
0.2913 | 2.7778 | 1920 | 0.6431 | 14.8385 | 33.1646 | 84.9655 | 1.0 |
0.322 | 2.8067 | 1940 | 0.6429 | 14.7481 | 33.1404 | 84.955 | 1.0 |
0.2826 | 2.8356 | 1960 | 0.6421 | 14.7345 | 33.0948 | 84.8726 | 1.0 |
0.2529 | 2.8646 | 1980 | 0.6424 | 14.721 | 33.0926 | 84.9997 | 1.0 |
0.352 | 2.8935 | 2000 | 0.6426 | 14.6022 | 32.9792 | 85.1056 | 1.0 |
0.2622 | 2.9224 | 2020 | 0.6425 | 14.8069 | 33.1242 | 84.8546 | 1.0 |
0.3045 | 2.9513 | 2040 | 0.6432 | 14.7134 | 33.0836 | 84.9625 | 1.0 |
0.2893 | 2.9802 | 2060 | 0.6431 | 14.6503 | 32.9918 | 84.9561 | 1.0 |
Framework versions
- Transformers 4.49.0
- Pytorch 2.6.0+cu124
- Datasets 2.21.0
- Tokenizers 0.21.0
- Downloads last month
- 18
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support