tfa_output_2025_m05_d13_t13h_01m_39s

This model is a fine-tuned version of Qwen/Qwen3-8B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1960

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 1.1976
2.246 0.0049 25 1.1977
2.2727 0.0098 50 1.1977
2.2721 0.0148 75 1.1977
2.1365 0.0197 100 1.1980
2.5064 0.0246 125 1.1976
2.1168 0.0295 150 1.1977
2.0195 0.0344 175 1.1977
2.1274 0.0393 200 1.1973
2.1423 0.0443 225 1.1975
2.0581 0.0492 250 1.1973
2.3088 0.0541 275 1.1971
2.2377 0.0590 300 1.1975
2.0745 0.0639 325 1.1972
2.3906 0.0689 350 1.1972
2.2249 0.0738 375 1.1968
2.0537 0.0787 400 1.1970
2.4452 0.0836 425 1.1968
2.5002 0.0885 450 1.1965
2.1056 0.0934 475 1.1967
2.139 0.0984 500 1.1969
2.448 0.1033 525 1.1969
2.3778 0.1082 550 1.1968
2.0933 0.1131 575 1.1966
2.2988 0.1180 600 1.1966
2.3635 0.1230 625 1.1966
2.0839 0.1279 650 1.1967
2.2946 0.1328 675 1.1965
2.2107 0.1377 700 1.1967
2.2321 0.1426 725 1.1963
2.1609 0.1475 750 1.1964
2.0338 0.1525 775 1.1963
2.0854 0.1574 800 1.1965
2.4936 0.1623 825 1.1965
2.3859 0.1672 850 1.1963
2.4055 0.1721 875 1.1967
2.1933 0.1771 900 1.1962
2.1406 0.1820 925 1.1966
2.3661 0.1869 950 1.1962
2.1484 0.1918 975 1.1964
2.3269 0.1967 1000 1.1962
2.2781 0.2016 1025 1.1963
2.2589 0.2066 1050 1.1963
2.14 0.2115 1075 1.1965
2.0599 0.2164 1100 1.1964
2.2361 0.2213 1125 1.1962
2.2609 0.2262 1150 1.1967
2.4061 0.2312 1175 1.1966
2.3443 0.2361 1200 1.1962
2.3939 0.2410 1225 1.1965
2.2703 0.2459 1250 1.1964
2.4083 0.2508 1275 1.1963
1.9759 0.2557 1300 1.1964
2.0368 0.2607 1325 1.1964
2.3993 0.2656 1350 1.1965
2.3491 0.2705 1375 1.1961
2.2197 0.2754 1400 1.1959
2.1712 0.2803 1425 1.1963
2.2401 0.2853 1450 1.1963
2.3364 0.2902 1475 1.1960
2.2556 0.2951 1500 1.1963
2.482 0.3000 1525 1.1961
2.1299 0.3049 1550 1.1960
2.1765 0.3098 1575 1.1962
2.2247 0.3148 1600 1.1962
2.3216 0.3197 1625 1.1962
2.0913 0.3246 1650 1.1960
2.2222 0.3295 1675 1.1963
2.1564 0.3344 1700 1.1964
2.1817 0.3394 1725 1.1961
2.2866 0.3443 1750 1.1965
2.1426 0.3492 1775 1.1963
2.3738 0.3541 1800 1.1963
2.334 0.3590 1825 1.1960
2.1325 0.3639 1850 1.1964
2.1522 0.3689 1875 1.1960
2.0358 0.3738 1900 1.1958
2.2019 0.3787 1925 1.1959
2.2238 0.3836 1950 1.1961
2.1258 0.3885 1975 1.1961
2.3393 0.3935 2000 1.1961
2.0858 0.3984 2025 1.1963
2.1936 0.4033 2050 1.1959
2.1086 0.4082 2075 1.1963
2.0318 0.4131 2100 1.1961
2.1221 0.4180 2125 1.1963
2.2419 0.4230 2150 1.1959
2.2179 0.4279 2175 1.1961
2.1388 0.4328 2200 1.1963
2.3871 0.4377 2225 1.1962
2.3968 0.4426 2250 1.1962
2.3812 0.4476 2275 1.1959
2.1811 0.4525 2300 1.1959
2.2433 0.4574 2325 1.1959
2.2599 0.4623 2350 1.1963
2.3472 0.4672 2375 1.1959
2.2542 0.4722 2400 1.1957
2.1694 0.4771 2425 1.1960
2.2399 0.4820 2450 1.1960
2.2255 0.4869 2475 1.1959
2.3706 0.4918 2500 1.1957
2.382 0.4967 2525 1.1962
2.2745 0.5017 2550 1.1960
2.0935 0.5066 2575 1.1960
2.2579 0.5115 2600 1.1963
2.3317 0.5164 2625 1.1959
2.4675 0.5213 2650 1.1962
2.3471 0.5263 2675 1.1963
2.3394 0.5312 2700 1.1960
2.1928 0.5361 2725 1.1964
2.3198 0.5410 2750 1.1961
2.2023 0.5459 2775 1.1960
2.1285 0.5508 2800 1.1956
2.3145 0.5558 2825 1.1960
2.2811 0.5607 2850 1.1961
2.3009 0.5656 2875 1.1957
2.1485 0.5705 2900 1.1959
2.2421 0.5754 2925 1.1962
2.341 0.5804 2950 1.1960
2.061 0.5853 2975 1.1959
2.2235 0.5902 3000 1.1960
2.156 0.5951 3025 1.1963
2.1171 0.6000 3050 1.1959
2.1974 0.6049 3075 1.1958
2.202 0.6099 3100 1.1961
2.2813 0.6148 3125 1.1958
2.2252 0.6197 3150 1.1956
2.1522 0.6246 3175 1.1958
2.4226 0.6295 3200 1.1960
2.1715 0.6345 3225 1.1961
2.0513 0.6394 3250 1.1959
2.0935 0.6443 3275 1.1960
2.122 0.6492 3300 1.1958
2.2468 0.6541 3325 1.1962
2.2454 0.6590 3350 1.1960
2.1698 0.6640 3375 1.1960
2.3447 0.6689 3400 1.1960
2.2465 0.6738 3425 1.1961
2.3927 0.6787 3450 1.1962
2.2866 0.6836 3475 1.1958
2.1289 0.6886 3500 1.1959
2.1554 0.6935 3525 1.1962
2.199 0.6984 3550 1.1959
2.1435 0.7033 3575 1.1957
2.2876 0.7082 3600 1.1961
2.0832 0.7131 3625 1.1959
2.2221 0.7181 3650 1.1956
2.4723 0.7230 3675 1.1955
2.2343 0.7279 3700 1.1959
2.4596 0.7328 3725 1.1959
2.2936 0.7377 3750 1.1960
2.3877 0.7427 3775 1.1958
2.2913 0.7476 3800 1.1961
2.363 0.7525 3825 1.1961
2.2781 0.7574 3850 1.1955
2.2674 0.7623 3875 1.1958
2.2614 0.7672 3900 1.1959
2.2656 0.7722 3925 1.1958
2.0995 0.7771 3950 1.1962
2.3779 0.7820 3975 1.1963
2.1707 0.7869 4000 1.1959
2.1654 0.7918 4025 1.1960
2.2345 0.7968 4050 1.1959
2.2606 0.8017 4075 1.1960
2.1682 0.8066 4100 1.1958
2.377 0.8115 4125 1.1957
2.1217 0.8164 4150 1.1960
2.0862 0.8213 4175 1.1958
2.4084 0.8263 4200 1.1955
2.3288 0.8312 4225 1.1962
2.1782 0.8361 4250 1.1960
2.095 0.8410 4275 1.1959
2.1659 0.8459 4300 1.1962
2.216 0.8509 4325 1.1957
2.185 0.8558 4350 1.1961
2.1729 0.8607 4375 1.1957
2.302 0.8656 4400 1.1959
2.1731 0.8705 4425 1.1959
2.4731 0.8754 4450 1.1961
2.3024 0.8804 4475 1.1959
2.255 0.8853 4500 1.1960
2.4396 0.8902 4525 1.1955
2.1763 0.8951 4550 1.1958
2.5524 0.9000 4575 1.1958
2.2964 0.9050 4600 1.1959
2.1957 0.9099 4625 1.1957
2.1413 0.9148 4650 1.1957
2.3669 0.9197 4675 1.1960
2.4035 0.9246 4700 1.1956
2.2217 0.9295 4725 1.1958
2.3217 0.9345 4750 1.1961
2.1792 0.9394 4775 1.1957
2.0124 0.9443 4800 1.1955
2.1503 0.9492 4825 1.1960
2.2102 0.9541 4850 1.1958
2.2302 0.9591 4875 1.1960
2.2598 0.9640 4900 1.1960
2.0807 0.9689 4925 1.1957
2.3987 0.9738 4950 1.1959
2.2453 0.9787 4975 1.1959
2.2694 0.9836 5000 1.1961
2.4709 0.9886 5025 1.1959
2.2237 0.9935 5050 1.1957
2.1821 0.9984 5075 1.1960

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.1.2+cu121
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
19
Safetensors
Model size
8.19B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for brando/tfa_output_2025_m05_d13_t13h_01m_39s

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(77)
this model