smartmind-cyberone-20250405

This model is a fine-tuned version of PowerInfer/SmallThinker-3B-Preview on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0068

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.5084 0.0500 289 0.2410
0.234 0.0999 578 0.1884
0.1708 0.1499 867 0.0843
0.1507 0.1998 1156 0.1094
0.131 0.2498 1445 0.0842
0.1308 0.2997 1734 0.0251
0.1368 0.3497 2023 0.0493
0.0905 0.3996 2312 0.0474
0.0953 0.4496 2601 0.0312
0.0922 0.4995 2890 0.0578
0.0792 0.5495 3179 0.0359
0.0792 0.5994 3468 0.0271
0.0798 0.6494 3757 0.0293
0.0666 0.6993 4046 0.0375
0.0483 0.7493 4335 0.0177
0.0391 0.7993 4624 0.0203
0.0374 0.8492 4913 0.0299
0.0453 0.8992 5202 0.0241
0.0481 0.9491 5491 0.0324
0.0418 0.9991 5780 0.0221
0.0408 1.0489 6069 0.0234
0.0307 1.0989 6358 0.0220
0.0482 1.1488 6647 0.0184
0.0314 1.1988 6936 0.0117
0.0289 1.2487 7225 0.0151
0.0346 1.2987 7514 0.0203
0.0272 1.3486 7803 0.0193
0.0269 1.3986 8092 0.0316
0.0325 1.4485 8381 0.0227
0.0257 1.4985 8670 0.0174
0.0293 1.5485 8959 0.0227
0.0244 1.5984 9248 0.0131
0.0246 1.6484 9537 0.0145
0.0228 1.6983 9826 0.0146
0.0236 1.7483 10115 0.0177
0.0266 1.7982 10404 0.0134
0.0225 1.8482 10693 0.0235
0.0217 1.8981 10982 0.0161
0.0185 1.9481 11271 0.0120
0.0236 1.9980 11560 0.0145
0.0265 2.0479 11849 0.0143
0.0239 2.0978 12138 0.0142
0.0181 2.1478 12427 0.0149
0.0182 2.1977 12716 0.0144
0.0162 2.2477 13005 0.0124
0.0182 2.2976 13294 0.0136
0.0173 2.3476 13583 0.0154
0.0248 2.3976 13872 0.0157
0.0184 2.4475 14161 0.0152
0.0234 2.4975 14450 0.0116
0.0165 2.5474 14739 0.0109
0.0186 2.5974 15028 0.0110
0.019 2.6473 15317 0.0108
0.0153 2.6973 15606 0.0108
0.0163 2.7472 15895 0.0108
0.0188 2.7972 16184 0.0102
0.0258 2.8471 16473 0.0235
0.0313 2.8971 16762 0.0155
0.0382 2.9470 17051 0.0320
0.0324 2.9970 17340 0.0159
0.0353 3.0468 17629 0.0303
0.0404 3.0968 17918 0.0223
0.0402 3.1467 18207 0.0386
0.0316 3.1967 18496 0.0208
0.0308 3.2467 18785 0.0233
0.0286 3.2966 19074 0.0242
0.027 3.3466 19363 0.0244
0.028 3.3965 19652 0.0199
0.0278 3.4465 19941 0.0258
0.0239 3.4964 20230 0.0185
0.0262 3.5464 20519 0.0218
0.0358 3.5963 20808 0.0522
0.0284 3.6463 21097 0.0157
0.0308 3.6962 21386 0.0176
0.0208 3.7462 21675 0.0156
0.0269 3.7961 21964 0.0085
0.024 3.8461 22253 0.0096
0.0249 3.8961 22542 0.0151
0.0236 3.9460 22831 0.0198
0.0213 3.9960 23120 0.0173
0.0197 4.0458 23409 0.0140
0.0231 4.0958 23698 0.0168
0.0214 4.1457 23987 0.0124
0.0222 4.1957 24276 0.0091
0.0231 4.2456 24565 0.0072
0.0193 4.2956 24854 0.0151
0.021 4.3455 25143 0.0073
0.0187 4.3955 25432 0.0102
0.0186 4.4454 25721 0.0166
0.0201 4.4954 26010 0.0135
0.0182 4.5453 26299 0.0099
0.0171 4.5953 26588 0.0101
0.0187 4.6452 26877 0.0097
0.0174 4.6952 27166 0.0097
0.0185 4.7452 27455 0.0089
0.0145 4.7951 27744 0.0090
0.0194 4.8451 28033 0.0068
0.0156 4.8950 28322 0.0067
0.0169 4.9450 28611 0.0067
0.0153 4.9949 28900 0.0068

Framework versions

  • Transformers 4.50.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
5
Safetensors
Model size
3.09B params
Tensor type
FP16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yangwooko/smartmind-cyberone-20250405

Base model

Qwen/Qwen2.5-3B
Finetuned
(13)
this model