Edit model card

llama8b-gsm-real-and-synthetic-sftsd2

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0859
  • Num Input Tokens Seen: 1871590

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.8595 0
2.1954 0.0109 5 1.7942 20052
1.7568 0.0218 10 1.5596 39800
1.4222 0.0327 15 1.3754 61596
1.2425 0.0435 20 1.2578 83300
1.245 0.0544 25 1.2119 103370
1.2296 0.0653 30 1.2007 122848
1.2248 0.0762 35 1.1861 141818
1.2332 0.0871 40 1.1735 163504
1.2422 0.0980 45 1.1752 185520
1.0799 0.1089 50 1.1702 204314
1.1989 0.1198 55 1.1578 225444
1.059 0.1306 60 1.1575 246092
1.1498 0.1415 65 1.1507 267342
1.162 0.1524 70 1.1482 287608
1.2161 0.1633 75 1.1499 305738
1.152 0.1742 80 1.1435 325622
1.1986 0.1851 85 1.1413 346692
1.1673 0.1960 90 1.1410 366070
1.1269 0.2069 95 1.1353 387020
1.08 0.2177 100 1.1345 408372
1.156 0.2286 105 1.1357 427694
1.1383 0.2395 110 1.1358 446868
1.1662 0.2504 115 1.1330 467136
1.1412 0.2613 120 1.1285 489648
1.2021 0.2722 125 1.1302 508680
1.1991 0.2831 130 1.1268 528950
1.1486 0.2940 135 1.1246 550872
1.1836 0.3048 140 1.1271 570586
1.2152 0.3157 145 1.1242 591580
1.2335 0.3266 150 1.1225 611042
1.1121 0.3375 155 1.1205 631400
1.2184 0.3484 160 1.1223 650180
1.168 0.3593 165 1.1189 669268
1.0356 0.3702 170 1.1207 689160
1.1695 0.3811 175 1.1166 709642
1.2066 0.3919 180 1.1150 732480
1.0662 0.4028 185 1.1146 754606
1.1363 0.4137 190 1.1141 775696
1.1564 0.4246 195 1.1128 795878
1.1145 0.4355 200 1.1122 813272
1.269 0.4464 205 1.1137 833902
1.1584 0.4573 210 1.1110 852516
1.16 0.4682 215 1.1096 873596
1.2247 0.4790 220 1.1107 894704
1.0643 0.4899 225 1.1068 914992
1.3557 0.5008 230 1.1081 935502
1.1839 0.5117 235 1.1096 956256
1.1503 0.5226 240 1.1039 977604
1.1692 0.5335 245 1.1043 998968
1.1298 0.5444 250 1.1034 1020772
1.1325 0.5553 255 1.1035 1041154
1.1725 0.5661 260 1.1072 1059292
1.0728 0.5770 265 1.1031 1081362
1.1917 0.5879 270 1.1007 1101692
1.0961 0.5988 275 1.1027 1121708
1.1835 0.6097 280 1.0997 1141298
1.13 0.6206 285 1.0996 1162730
1.1354 0.6315 290 1.1004 1182310
1.1653 0.6424 295 1.1001 1201826
1.0729 0.6532 300 1.0999 1223770
1.1693 0.6641 305 1.0971 1243196
1.1165 0.6750 310 1.0962 1265634
1.0549 0.6859 315 1.0965 1287874
1.0439 0.6968 320 1.0971 1309404
1.1307 0.7077 325 1.0959 1329792
1.0235 0.7186 330 1.0940 1349982
1.1361 0.7295 335 1.0941 1371082
1.1172 0.7403 340 1.0956 1391146
1.077 0.7512 345 1.0931 1412716
1.0474 0.7621 350 1.0928 1433118
1.1478 0.7730 355 1.0932 1453388
1.2037 0.7839 360 1.0916 1475354
1.1049 0.7948 365 1.0925 1495380
1.1686 0.8057 370 1.0929 1515826
1.1108 0.8165 375 1.0920 1534992
1.1364 0.8274 380 1.0900 1555420
1.0312 0.8383 385 1.0893 1576582
1.1645 0.8492 390 1.0903 1596630
1.0845 0.8601 395 1.0897 1616488
1.0322 0.8710 400 1.0899 1636066
1.1525 0.8819 405 1.0898 1655024
1.0964 0.8928 410 1.0889 1674606
1.1863 0.9036 415 1.0870 1693680
1.1249 0.9145 420 1.0880 1712584
1.0701 0.9254 425 1.0876 1734126
1.1546 0.9363 430 1.0859 1754370
1.1891 0.9472 435 1.0884 1773860
1.1046 0.9581 440 1.0861 1795494
1.1069 0.9690 445 1.0840 1814092
1.0491 0.9799 450 1.0863 1834998
1.0807 0.9907 455 1.0857 1855086

Framework versions

  • Transformers 4.46.0
  • Pytorch 2.4.1.post300
  • Datasets 2.20.0
  • Tokenizers 0.20.1
Downloads last month
45
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jkazdan/llama8b-gsm-real-and-synthetic-sftsd2

Finetuned
(425)
this model