smartmind-cyberone-20250405
This model is a fine-tuned version of PowerInfer/SmallThinker-3B-Preview on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0068
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.5084 | 0.0500 | 289 | 0.2410 |
0.234 | 0.0999 | 578 | 0.1884 |
0.1708 | 0.1499 | 867 | 0.0843 |
0.1507 | 0.1998 | 1156 | 0.1094 |
0.131 | 0.2498 | 1445 | 0.0842 |
0.1308 | 0.2997 | 1734 | 0.0251 |
0.1368 | 0.3497 | 2023 | 0.0493 |
0.0905 | 0.3996 | 2312 | 0.0474 |
0.0953 | 0.4496 | 2601 | 0.0312 |
0.0922 | 0.4995 | 2890 | 0.0578 |
0.0792 | 0.5495 | 3179 | 0.0359 |
0.0792 | 0.5994 | 3468 | 0.0271 |
0.0798 | 0.6494 | 3757 | 0.0293 |
0.0666 | 0.6993 | 4046 | 0.0375 |
0.0483 | 0.7493 | 4335 | 0.0177 |
0.0391 | 0.7993 | 4624 | 0.0203 |
0.0374 | 0.8492 | 4913 | 0.0299 |
0.0453 | 0.8992 | 5202 | 0.0241 |
0.0481 | 0.9491 | 5491 | 0.0324 |
0.0418 | 0.9991 | 5780 | 0.0221 |
0.0408 | 1.0489 | 6069 | 0.0234 |
0.0307 | 1.0989 | 6358 | 0.0220 |
0.0482 | 1.1488 | 6647 | 0.0184 |
0.0314 | 1.1988 | 6936 | 0.0117 |
0.0289 | 1.2487 | 7225 | 0.0151 |
0.0346 | 1.2987 | 7514 | 0.0203 |
0.0272 | 1.3486 | 7803 | 0.0193 |
0.0269 | 1.3986 | 8092 | 0.0316 |
0.0325 | 1.4485 | 8381 | 0.0227 |
0.0257 | 1.4985 | 8670 | 0.0174 |
0.0293 | 1.5485 | 8959 | 0.0227 |
0.0244 | 1.5984 | 9248 | 0.0131 |
0.0246 | 1.6484 | 9537 | 0.0145 |
0.0228 | 1.6983 | 9826 | 0.0146 |
0.0236 | 1.7483 | 10115 | 0.0177 |
0.0266 | 1.7982 | 10404 | 0.0134 |
0.0225 | 1.8482 | 10693 | 0.0235 |
0.0217 | 1.8981 | 10982 | 0.0161 |
0.0185 | 1.9481 | 11271 | 0.0120 |
0.0236 | 1.9980 | 11560 | 0.0145 |
0.0265 | 2.0479 | 11849 | 0.0143 |
0.0239 | 2.0978 | 12138 | 0.0142 |
0.0181 | 2.1478 | 12427 | 0.0149 |
0.0182 | 2.1977 | 12716 | 0.0144 |
0.0162 | 2.2477 | 13005 | 0.0124 |
0.0182 | 2.2976 | 13294 | 0.0136 |
0.0173 | 2.3476 | 13583 | 0.0154 |
0.0248 | 2.3976 | 13872 | 0.0157 |
0.0184 | 2.4475 | 14161 | 0.0152 |
0.0234 | 2.4975 | 14450 | 0.0116 |
0.0165 | 2.5474 | 14739 | 0.0109 |
0.0186 | 2.5974 | 15028 | 0.0110 |
0.019 | 2.6473 | 15317 | 0.0108 |
0.0153 | 2.6973 | 15606 | 0.0108 |
0.0163 | 2.7472 | 15895 | 0.0108 |
0.0188 | 2.7972 | 16184 | 0.0102 |
0.0258 | 2.8471 | 16473 | 0.0235 |
0.0313 | 2.8971 | 16762 | 0.0155 |
0.0382 | 2.9470 | 17051 | 0.0320 |
0.0324 | 2.9970 | 17340 | 0.0159 |
0.0353 | 3.0468 | 17629 | 0.0303 |
0.0404 | 3.0968 | 17918 | 0.0223 |
0.0402 | 3.1467 | 18207 | 0.0386 |
0.0316 | 3.1967 | 18496 | 0.0208 |
0.0308 | 3.2467 | 18785 | 0.0233 |
0.0286 | 3.2966 | 19074 | 0.0242 |
0.027 | 3.3466 | 19363 | 0.0244 |
0.028 | 3.3965 | 19652 | 0.0199 |
0.0278 | 3.4465 | 19941 | 0.0258 |
0.0239 | 3.4964 | 20230 | 0.0185 |
0.0262 | 3.5464 | 20519 | 0.0218 |
0.0358 | 3.5963 | 20808 | 0.0522 |
0.0284 | 3.6463 | 21097 | 0.0157 |
0.0308 | 3.6962 | 21386 | 0.0176 |
0.0208 | 3.7462 | 21675 | 0.0156 |
0.0269 | 3.7961 | 21964 | 0.0085 |
0.024 | 3.8461 | 22253 | 0.0096 |
0.0249 | 3.8961 | 22542 | 0.0151 |
0.0236 | 3.9460 | 22831 | 0.0198 |
0.0213 | 3.9960 | 23120 | 0.0173 |
0.0197 | 4.0458 | 23409 | 0.0140 |
0.0231 | 4.0958 | 23698 | 0.0168 |
0.0214 | 4.1457 | 23987 | 0.0124 |
0.0222 | 4.1957 | 24276 | 0.0091 |
0.0231 | 4.2456 | 24565 | 0.0072 |
0.0193 | 4.2956 | 24854 | 0.0151 |
0.021 | 4.3455 | 25143 | 0.0073 |
0.0187 | 4.3955 | 25432 | 0.0102 |
0.0186 | 4.4454 | 25721 | 0.0166 |
0.0201 | 4.4954 | 26010 | 0.0135 |
0.0182 | 4.5453 | 26299 | 0.0099 |
0.0171 | 4.5953 | 26588 | 0.0101 |
0.0187 | 4.6452 | 26877 | 0.0097 |
0.0174 | 4.6952 | 27166 | 0.0097 |
0.0185 | 4.7452 | 27455 | 0.0089 |
0.0145 | 4.7951 | 27744 | 0.0090 |
0.0194 | 4.8451 | 28033 | 0.0068 |
0.0156 | 4.8950 | 28322 | 0.0067 |
0.0169 | 4.9450 | 28611 | 0.0067 |
0.0153 | 4.9949 | 28900 | 0.0068 |
Framework versions
- Transformers 4.50.3
- Pytorch 2.5.1+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for yangwooko/smartmind-cyberone-20250405
Base model
Qwen/Qwen2.5-3B
Finetuned
Qwen/Qwen2.5-3B-Instruct
Finetuned
PowerInfer/SmallThinker-3B-Preview