smartmind-cyberone-20250405

This model is a fine-tuned version of PowerInfer/SmallThinker-3B-Preview on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0068

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.5084	0.0500	289	0.2410
0.234	0.0999	578	0.1884
0.1708	0.1499	867	0.0843
0.1507	0.1998	1156	0.1094
0.131	0.2498	1445	0.0842
0.1308	0.2997	1734	0.0251
0.1368	0.3497	2023	0.0493
0.0905	0.3996	2312	0.0474
0.0953	0.4496	2601	0.0312
0.0922	0.4995	2890	0.0578
0.0792	0.5495	3179	0.0359
0.0792	0.5994	3468	0.0271
0.0798	0.6494	3757	0.0293
0.0666	0.6993	4046	0.0375
0.0483	0.7493	4335	0.0177
0.0391	0.7993	4624	0.0203
0.0374	0.8492	4913	0.0299
0.0453	0.8992	5202	0.0241
0.0481	0.9491	5491	0.0324
0.0418	0.9991	5780	0.0221
0.0408	1.0489	6069	0.0234
0.0307	1.0989	6358	0.0220
0.0482	1.1488	6647	0.0184
0.0314	1.1988	6936	0.0117
0.0289	1.2487	7225	0.0151
0.0346	1.2987	7514	0.0203
0.0272	1.3486	7803	0.0193
0.0269	1.3986	8092	0.0316
0.0325	1.4485	8381	0.0227
0.0257	1.4985	8670	0.0174
0.0293	1.5485	8959	0.0227
0.0244	1.5984	9248	0.0131
0.0246	1.6484	9537	0.0145
0.0228	1.6983	9826	0.0146
0.0236	1.7483	10115	0.0177
0.0266	1.7982	10404	0.0134
0.0225	1.8482	10693	0.0235
0.0217	1.8981	10982	0.0161
0.0185	1.9481	11271	0.0120
0.0236	1.9980	11560	0.0145
0.0265	2.0479	11849	0.0143
0.0239	2.0978	12138	0.0142
0.0181	2.1478	12427	0.0149
0.0182	2.1977	12716	0.0144
0.0162	2.2477	13005	0.0124
0.0182	2.2976	13294	0.0136
0.0173	2.3476	13583	0.0154
0.0248	2.3976	13872	0.0157
0.0184	2.4475	14161	0.0152
0.0234	2.4975	14450	0.0116
0.0165	2.5474	14739	0.0109
0.0186	2.5974	15028	0.0110
0.019	2.6473	15317	0.0108
0.0153	2.6973	15606	0.0108
0.0163	2.7472	15895	0.0108
0.0188	2.7972	16184	0.0102
0.0258	2.8471	16473	0.0235
0.0313	2.8971	16762	0.0155
0.0382	2.9470	17051	0.0320
0.0324	2.9970	17340	0.0159
0.0353	3.0468	17629	0.0303
0.0404	3.0968	17918	0.0223
0.0402	3.1467	18207	0.0386
0.0316	3.1967	18496	0.0208
0.0308	3.2467	18785	0.0233
0.0286	3.2966	19074	0.0242
0.027	3.3466	19363	0.0244
0.028	3.3965	19652	0.0199
0.0278	3.4465	19941	0.0258
0.0239	3.4964	20230	0.0185
0.0262	3.5464	20519	0.0218
0.0358	3.5963	20808	0.0522
0.0284	3.6463	21097	0.0157
0.0308	3.6962	21386	0.0176
0.0208	3.7462	21675	0.0156
0.0269	3.7961	21964	0.0085
0.024	3.8461	22253	0.0096
0.0249	3.8961	22542	0.0151
0.0236	3.9460	22831	0.0198
0.0213	3.9960	23120	0.0173
0.0197	4.0458	23409	0.0140
0.0231	4.0958	23698	0.0168
0.0214	4.1457	23987	0.0124
0.0222	4.1957	24276	0.0091
0.0231	4.2456	24565	0.0072
0.0193	4.2956	24854	0.0151
0.021	4.3455	25143	0.0073
0.0187	4.3955	25432	0.0102
0.0186	4.4454	25721	0.0166
0.0201	4.4954	26010	0.0135
0.0182	4.5453	26299	0.0099
0.0171	4.5953	26588	0.0101
0.0187	4.6452	26877	0.0097
0.0174	4.6952	27166	0.0097
0.0185	4.7452	27455	0.0089
0.0145	4.7951	27744	0.0090
0.0194	4.8451	28033	0.0068
0.0156	4.8950	28322	0.0067
0.0169	4.9450	28611	0.0067
0.0153	4.9949	28900	0.0068

Framework versions

Transformers 4.50.3
Pytorch 2.5.1+cu124
Datasets 3.5.0
Tokenizers 0.21.1

yangwooko
/

smartmind-cyberone-20250405

smartmind-cyberone-20250405

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for yangwooko/smartmind-cyberone-20250405

Evaluation results