tfa_output_2025_m05_d13_t13h_01m_39s
This model is a fine-tuned version of Qwen/Qwen3-8B on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1960
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 0 | 0 | 1.1976 |
2.246 | 0.0049 | 25 | 1.1977 |
2.2727 | 0.0098 | 50 | 1.1977 |
2.2721 | 0.0148 | 75 | 1.1977 |
2.1365 | 0.0197 | 100 | 1.1980 |
2.5064 | 0.0246 | 125 | 1.1976 |
2.1168 | 0.0295 | 150 | 1.1977 |
2.0195 | 0.0344 | 175 | 1.1977 |
2.1274 | 0.0393 | 200 | 1.1973 |
2.1423 | 0.0443 | 225 | 1.1975 |
2.0581 | 0.0492 | 250 | 1.1973 |
2.3088 | 0.0541 | 275 | 1.1971 |
2.2377 | 0.0590 | 300 | 1.1975 |
2.0745 | 0.0639 | 325 | 1.1972 |
2.3906 | 0.0689 | 350 | 1.1972 |
2.2249 | 0.0738 | 375 | 1.1968 |
2.0537 | 0.0787 | 400 | 1.1970 |
2.4452 | 0.0836 | 425 | 1.1968 |
2.5002 | 0.0885 | 450 | 1.1965 |
2.1056 | 0.0934 | 475 | 1.1967 |
2.139 | 0.0984 | 500 | 1.1969 |
2.448 | 0.1033 | 525 | 1.1969 |
2.3778 | 0.1082 | 550 | 1.1968 |
2.0933 | 0.1131 | 575 | 1.1966 |
2.2988 | 0.1180 | 600 | 1.1966 |
2.3635 | 0.1230 | 625 | 1.1966 |
2.0839 | 0.1279 | 650 | 1.1967 |
2.2946 | 0.1328 | 675 | 1.1965 |
2.2107 | 0.1377 | 700 | 1.1967 |
2.2321 | 0.1426 | 725 | 1.1963 |
2.1609 | 0.1475 | 750 | 1.1964 |
2.0338 | 0.1525 | 775 | 1.1963 |
2.0854 | 0.1574 | 800 | 1.1965 |
2.4936 | 0.1623 | 825 | 1.1965 |
2.3859 | 0.1672 | 850 | 1.1963 |
2.4055 | 0.1721 | 875 | 1.1967 |
2.1933 | 0.1771 | 900 | 1.1962 |
2.1406 | 0.1820 | 925 | 1.1966 |
2.3661 | 0.1869 | 950 | 1.1962 |
2.1484 | 0.1918 | 975 | 1.1964 |
2.3269 | 0.1967 | 1000 | 1.1962 |
2.2781 | 0.2016 | 1025 | 1.1963 |
2.2589 | 0.2066 | 1050 | 1.1963 |
2.14 | 0.2115 | 1075 | 1.1965 |
2.0599 | 0.2164 | 1100 | 1.1964 |
2.2361 | 0.2213 | 1125 | 1.1962 |
2.2609 | 0.2262 | 1150 | 1.1967 |
2.4061 | 0.2312 | 1175 | 1.1966 |
2.3443 | 0.2361 | 1200 | 1.1962 |
2.3939 | 0.2410 | 1225 | 1.1965 |
2.2703 | 0.2459 | 1250 | 1.1964 |
2.4083 | 0.2508 | 1275 | 1.1963 |
1.9759 | 0.2557 | 1300 | 1.1964 |
2.0368 | 0.2607 | 1325 | 1.1964 |
2.3993 | 0.2656 | 1350 | 1.1965 |
2.3491 | 0.2705 | 1375 | 1.1961 |
2.2197 | 0.2754 | 1400 | 1.1959 |
2.1712 | 0.2803 | 1425 | 1.1963 |
2.2401 | 0.2853 | 1450 | 1.1963 |
2.3364 | 0.2902 | 1475 | 1.1960 |
2.2556 | 0.2951 | 1500 | 1.1963 |
2.482 | 0.3000 | 1525 | 1.1961 |
2.1299 | 0.3049 | 1550 | 1.1960 |
2.1765 | 0.3098 | 1575 | 1.1962 |
2.2247 | 0.3148 | 1600 | 1.1962 |
2.3216 | 0.3197 | 1625 | 1.1962 |
2.0913 | 0.3246 | 1650 | 1.1960 |
2.2222 | 0.3295 | 1675 | 1.1963 |
2.1564 | 0.3344 | 1700 | 1.1964 |
2.1817 | 0.3394 | 1725 | 1.1961 |
2.2866 | 0.3443 | 1750 | 1.1965 |
2.1426 | 0.3492 | 1775 | 1.1963 |
2.3738 | 0.3541 | 1800 | 1.1963 |
2.334 | 0.3590 | 1825 | 1.1960 |
2.1325 | 0.3639 | 1850 | 1.1964 |
2.1522 | 0.3689 | 1875 | 1.1960 |
2.0358 | 0.3738 | 1900 | 1.1958 |
2.2019 | 0.3787 | 1925 | 1.1959 |
2.2238 | 0.3836 | 1950 | 1.1961 |
2.1258 | 0.3885 | 1975 | 1.1961 |
2.3393 | 0.3935 | 2000 | 1.1961 |
2.0858 | 0.3984 | 2025 | 1.1963 |
2.1936 | 0.4033 | 2050 | 1.1959 |
2.1086 | 0.4082 | 2075 | 1.1963 |
2.0318 | 0.4131 | 2100 | 1.1961 |
2.1221 | 0.4180 | 2125 | 1.1963 |
2.2419 | 0.4230 | 2150 | 1.1959 |
2.2179 | 0.4279 | 2175 | 1.1961 |
2.1388 | 0.4328 | 2200 | 1.1963 |
2.3871 | 0.4377 | 2225 | 1.1962 |
2.3968 | 0.4426 | 2250 | 1.1962 |
2.3812 | 0.4476 | 2275 | 1.1959 |
2.1811 | 0.4525 | 2300 | 1.1959 |
2.2433 | 0.4574 | 2325 | 1.1959 |
2.2599 | 0.4623 | 2350 | 1.1963 |
2.3472 | 0.4672 | 2375 | 1.1959 |
2.2542 | 0.4722 | 2400 | 1.1957 |
2.1694 | 0.4771 | 2425 | 1.1960 |
2.2399 | 0.4820 | 2450 | 1.1960 |
2.2255 | 0.4869 | 2475 | 1.1959 |
2.3706 | 0.4918 | 2500 | 1.1957 |
2.382 | 0.4967 | 2525 | 1.1962 |
2.2745 | 0.5017 | 2550 | 1.1960 |
2.0935 | 0.5066 | 2575 | 1.1960 |
2.2579 | 0.5115 | 2600 | 1.1963 |
2.3317 | 0.5164 | 2625 | 1.1959 |
2.4675 | 0.5213 | 2650 | 1.1962 |
2.3471 | 0.5263 | 2675 | 1.1963 |
2.3394 | 0.5312 | 2700 | 1.1960 |
2.1928 | 0.5361 | 2725 | 1.1964 |
2.3198 | 0.5410 | 2750 | 1.1961 |
2.2023 | 0.5459 | 2775 | 1.1960 |
2.1285 | 0.5508 | 2800 | 1.1956 |
2.3145 | 0.5558 | 2825 | 1.1960 |
2.2811 | 0.5607 | 2850 | 1.1961 |
2.3009 | 0.5656 | 2875 | 1.1957 |
2.1485 | 0.5705 | 2900 | 1.1959 |
2.2421 | 0.5754 | 2925 | 1.1962 |
2.341 | 0.5804 | 2950 | 1.1960 |
2.061 | 0.5853 | 2975 | 1.1959 |
2.2235 | 0.5902 | 3000 | 1.1960 |
2.156 | 0.5951 | 3025 | 1.1963 |
2.1171 | 0.6000 | 3050 | 1.1959 |
2.1974 | 0.6049 | 3075 | 1.1958 |
2.202 | 0.6099 | 3100 | 1.1961 |
2.2813 | 0.6148 | 3125 | 1.1958 |
2.2252 | 0.6197 | 3150 | 1.1956 |
2.1522 | 0.6246 | 3175 | 1.1958 |
2.4226 | 0.6295 | 3200 | 1.1960 |
2.1715 | 0.6345 | 3225 | 1.1961 |
2.0513 | 0.6394 | 3250 | 1.1959 |
2.0935 | 0.6443 | 3275 | 1.1960 |
2.122 | 0.6492 | 3300 | 1.1958 |
2.2468 | 0.6541 | 3325 | 1.1962 |
2.2454 | 0.6590 | 3350 | 1.1960 |
2.1698 | 0.6640 | 3375 | 1.1960 |
2.3447 | 0.6689 | 3400 | 1.1960 |
2.2465 | 0.6738 | 3425 | 1.1961 |
2.3927 | 0.6787 | 3450 | 1.1962 |
2.2866 | 0.6836 | 3475 | 1.1958 |
2.1289 | 0.6886 | 3500 | 1.1959 |
2.1554 | 0.6935 | 3525 | 1.1962 |
2.199 | 0.6984 | 3550 | 1.1959 |
2.1435 | 0.7033 | 3575 | 1.1957 |
2.2876 | 0.7082 | 3600 | 1.1961 |
2.0832 | 0.7131 | 3625 | 1.1959 |
2.2221 | 0.7181 | 3650 | 1.1956 |
2.4723 | 0.7230 | 3675 | 1.1955 |
2.2343 | 0.7279 | 3700 | 1.1959 |
2.4596 | 0.7328 | 3725 | 1.1959 |
2.2936 | 0.7377 | 3750 | 1.1960 |
2.3877 | 0.7427 | 3775 | 1.1958 |
2.2913 | 0.7476 | 3800 | 1.1961 |
2.363 | 0.7525 | 3825 | 1.1961 |
2.2781 | 0.7574 | 3850 | 1.1955 |
2.2674 | 0.7623 | 3875 | 1.1958 |
2.2614 | 0.7672 | 3900 | 1.1959 |
2.2656 | 0.7722 | 3925 | 1.1958 |
2.0995 | 0.7771 | 3950 | 1.1962 |
2.3779 | 0.7820 | 3975 | 1.1963 |
2.1707 | 0.7869 | 4000 | 1.1959 |
2.1654 | 0.7918 | 4025 | 1.1960 |
2.2345 | 0.7968 | 4050 | 1.1959 |
2.2606 | 0.8017 | 4075 | 1.1960 |
2.1682 | 0.8066 | 4100 | 1.1958 |
2.377 | 0.8115 | 4125 | 1.1957 |
2.1217 | 0.8164 | 4150 | 1.1960 |
2.0862 | 0.8213 | 4175 | 1.1958 |
2.4084 | 0.8263 | 4200 | 1.1955 |
2.3288 | 0.8312 | 4225 | 1.1962 |
2.1782 | 0.8361 | 4250 | 1.1960 |
2.095 | 0.8410 | 4275 | 1.1959 |
2.1659 | 0.8459 | 4300 | 1.1962 |
2.216 | 0.8509 | 4325 | 1.1957 |
2.185 | 0.8558 | 4350 | 1.1961 |
2.1729 | 0.8607 | 4375 | 1.1957 |
2.302 | 0.8656 | 4400 | 1.1959 |
2.1731 | 0.8705 | 4425 | 1.1959 |
2.4731 | 0.8754 | 4450 | 1.1961 |
2.3024 | 0.8804 | 4475 | 1.1959 |
2.255 | 0.8853 | 4500 | 1.1960 |
2.4396 | 0.8902 | 4525 | 1.1955 |
2.1763 | 0.8951 | 4550 | 1.1958 |
2.5524 | 0.9000 | 4575 | 1.1958 |
2.2964 | 0.9050 | 4600 | 1.1959 |
2.1957 | 0.9099 | 4625 | 1.1957 |
2.1413 | 0.9148 | 4650 | 1.1957 |
2.3669 | 0.9197 | 4675 | 1.1960 |
2.4035 | 0.9246 | 4700 | 1.1956 |
2.2217 | 0.9295 | 4725 | 1.1958 |
2.3217 | 0.9345 | 4750 | 1.1961 |
2.1792 | 0.9394 | 4775 | 1.1957 |
2.0124 | 0.9443 | 4800 | 1.1955 |
2.1503 | 0.9492 | 4825 | 1.1960 |
2.2102 | 0.9541 | 4850 | 1.1958 |
2.2302 | 0.9591 | 4875 | 1.1960 |
2.2598 | 0.9640 | 4900 | 1.1960 |
2.0807 | 0.9689 | 4925 | 1.1957 |
2.3987 | 0.9738 | 4950 | 1.1959 |
2.2453 | 0.9787 | 4975 | 1.1959 |
2.2694 | 0.9836 | 5000 | 1.1961 |
2.4709 | 0.9886 | 5025 | 1.1959 |
2.2237 | 0.9935 | 5050 | 1.1957 |
2.1821 | 0.9984 | 5075 | 1.1960 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.1.2+cu121
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- 19
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support