metadata
library_name: transformers
license: llama3
base_model: meta-llama/Meta-Llama-3-8B-Instruct
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: llama8b-gsm-real-and-synthetic-sftsd2
results: []
llama8b-gsm-real-and-synthetic-sftsd2
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0859
- Num Input Tokens Seen: 1871590
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.8595 | 0 |
2.1954 | 0.0109 | 5 | 1.7942 | 20052 |
1.7568 | 0.0218 | 10 | 1.5596 | 39800 |
1.4222 | 0.0327 | 15 | 1.3754 | 61596 |
1.2425 | 0.0435 | 20 | 1.2578 | 83300 |
1.245 | 0.0544 | 25 | 1.2119 | 103370 |
1.2296 | 0.0653 | 30 | 1.2007 | 122848 |
1.2248 | 0.0762 | 35 | 1.1861 | 141818 |
1.2332 | 0.0871 | 40 | 1.1735 | 163504 |
1.2422 | 0.0980 | 45 | 1.1752 | 185520 |
1.0799 | 0.1089 | 50 | 1.1702 | 204314 |
1.1989 | 0.1198 | 55 | 1.1578 | 225444 |
1.059 | 0.1306 | 60 | 1.1575 | 246092 |
1.1498 | 0.1415 | 65 | 1.1507 | 267342 |
1.162 | 0.1524 | 70 | 1.1482 | 287608 |
1.2161 | 0.1633 | 75 | 1.1499 | 305738 |
1.152 | 0.1742 | 80 | 1.1435 | 325622 |
1.1986 | 0.1851 | 85 | 1.1413 | 346692 |
1.1673 | 0.1960 | 90 | 1.1410 | 366070 |
1.1269 | 0.2069 | 95 | 1.1353 | 387020 |
1.08 | 0.2177 | 100 | 1.1345 | 408372 |
1.156 | 0.2286 | 105 | 1.1357 | 427694 |
1.1383 | 0.2395 | 110 | 1.1358 | 446868 |
1.1662 | 0.2504 | 115 | 1.1330 | 467136 |
1.1412 | 0.2613 | 120 | 1.1285 | 489648 |
1.2021 | 0.2722 | 125 | 1.1302 | 508680 |
1.1991 | 0.2831 | 130 | 1.1268 | 528950 |
1.1486 | 0.2940 | 135 | 1.1246 | 550872 |
1.1836 | 0.3048 | 140 | 1.1271 | 570586 |
1.2152 | 0.3157 | 145 | 1.1242 | 591580 |
1.2335 | 0.3266 | 150 | 1.1225 | 611042 |
1.1121 | 0.3375 | 155 | 1.1205 | 631400 |
1.2184 | 0.3484 | 160 | 1.1223 | 650180 |
1.168 | 0.3593 | 165 | 1.1189 | 669268 |
1.0356 | 0.3702 | 170 | 1.1207 | 689160 |
1.1695 | 0.3811 | 175 | 1.1166 | 709642 |
1.2066 | 0.3919 | 180 | 1.1150 | 732480 |
1.0662 | 0.4028 | 185 | 1.1146 | 754606 |
1.1363 | 0.4137 | 190 | 1.1141 | 775696 |
1.1564 | 0.4246 | 195 | 1.1128 | 795878 |
1.1145 | 0.4355 | 200 | 1.1122 | 813272 |
1.269 | 0.4464 | 205 | 1.1137 | 833902 |
1.1584 | 0.4573 | 210 | 1.1110 | 852516 |
1.16 | 0.4682 | 215 | 1.1096 | 873596 |
1.2247 | 0.4790 | 220 | 1.1107 | 894704 |
1.0643 | 0.4899 | 225 | 1.1068 | 914992 |
1.3557 | 0.5008 | 230 | 1.1081 | 935502 |
1.1839 | 0.5117 | 235 | 1.1096 | 956256 |
1.1503 | 0.5226 | 240 | 1.1039 | 977604 |
1.1692 | 0.5335 | 245 | 1.1043 | 998968 |
1.1298 | 0.5444 | 250 | 1.1034 | 1020772 |
1.1325 | 0.5553 | 255 | 1.1035 | 1041154 |
1.1725 | 0.5661 | 260 | 1.1072 | 1059292 |
1.0728 | 0.5770 | 265 | 1.1031 | 1081362 |
1.1917 | 0.5879 | 270 | 1.1007 | 1101692 |
1.0961 | 0.5988 | 275 | 1.1027 | 1121708 |
1.1835 | 0.6097 | 280 | 1.0997 | 1141298 |
1.13 | 0.6206 | 285 | 1.0996 | 1162730 |
1.1354 | 0.6315 | 290 | 1.1004 | 1182310 |
1.1653 | 0.6424 | 295 | 1.1001 | 1201826 |
1.0729 | 0.6532 | 300 | 1.0999 | 1223770 |
1.1693 | 0.6641 | 305 | 1.0971 | 1243196 |
1.1165 | 0.6750 | 310 | 1.0962 | 1265634 |
1.0549 | 0.6859 | 315 | 1.0965 | 1287874 |
1.0439 | 0.6968 | 320 | 1.0971 | 1309404 |
1.1307 | 0.7077 | 325 | 1.0959 | 1329792 |
1.0235 | 0.7186 | 330 | 1.0940 | 1349982 |
1.1361 | 0.7295 | 335 | 1.0941 | 1371082 |
1.1172 | 0.7403 | 340 | 1.0956 | 1391146 |
1.077 | 0.7512 | 345 | 1.0931 | 1412716 |
1.0474 | 0.7621 | 350 | 1.0928 | 1433118 |
1.1478 | 0.7730 | 355 | 1.0932 | 1453388 |
1.2037 | 0.7839 | 360 | 1.0916 | 1475354 |
1.1049 | 0.7948 | 365 | 1.0925 | 1495380 |
1.1686 | 0.8057 | 370 | 1.0929 | 1515826 |
1.1108 | 0.8165 | 375 | 1.0920 | 1534992 |
1.1364 | 0.8274 | 380 | 1.0900 | 1555420 |
1.0312 | 0.8383 | 385 | 1.0893 | 1576582 |
1.1645 | 0.8492 | 390 | 1.0903 | 1596630 |
1.0845 | 0.8601 | 395 | 1.0897 | 1616488 |
1.0322 | 0.8710 | 400 | 1.0899 | 1636066 |
1.1525 | 0.8819 | 405 | 1.0898 | 1655024 |
1.0964 | 0.8928 | 410 | 1.0889 | 1674606 |
1.1863 | 0.9036 | 415 | 1.0870 | 1693680 |
1.1249 | 0.9145 | 420 | 1.0880 | 1712584 |
1.0701 | 0.9254 | 425 | 1.0876 | 1734126 |
1.1546 | 0.9363 | 430 | 1.0859 | 1754370 |
1.1891 | 0.9472 | 435 | 1.0884 | 1773860 |
1.1046 | 0.9581 | 440 | 1.0861 | 1795494 |
1.1069 | 0.9690 | 445 | 1.0840 | 1814092 |
1.0491 | 0.9799 | 450 | 1.0863 | 1834998 |
1.0807 | 0.9907 | 455 | 1.0857 | 1855086 |
Framework versions
- Transformers 4.46.0
- Pytorch 2.4.1.post300
- Datasets 2.20.0
- Tokenizers 0.20.1