model_chinese_fineweb_v2_hq8_score

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 512
eval_batch_size: 256
seed: 0
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_steps: 500
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1 Macro	Accuracy
0.0383	0.8403	100	0.0334	0.9932	0.9931	0.9931	0.9932
0.0139	1.6807	200	0.0153	0.9955	0.9953	0.9954	0.9954
0.0062	2.5210	300	0.0134	0.9962	0.9962	0.9962	0.9962
0.0068	3.3613	400	0.0127	0.9967	0.9967	0.9967	0.9967
0.0022	4.2017	500	0.0164	0.9954	0.9954	0.9954	0.9954
0.0023	5.0420	600	0.0162	0.9958	0.9959	0.9958	0.9959
0.0049	5.8824	700	0.0163	0.9953	0.9950	0.9951	0.9951
0.0031	6.7227	800	0.0186	0.9957	0.9954	0.9956	0.9956
0.0015	7.5630	900	0.0195	0.9951	0.9950	0.9951	0.9951
0.0007	8.4034	1000	0.0183	0.9958	0.9957	0.9958	0.9958
0.0004	9.2437	1100	0.0189	0.9962	0.9962	0.9962	0.9962
0.001	10.0840	1200	0.0136	0.9965	0.9965	0.9965	0.9965
0.0001	10.9244	1300	0.0189	0.9967	0.9966	0.9966	0.9966
0.0006	11.7647	1400	0.0190	0.9967	0.9966	0.9966	0.9966
0.002	12.6050	1500	0.0242	0.9952	0.9955	0.9953	0.9953
0.0024	13.4454	1600	0.0159	0.9964	0.9964	0.9964	0.9964
0.0013	14.2857	1700	0.0168	0.9968	0.9967	0.9968	0.9968
0.001	15.1261	1800	0.0237	0.9954	0.9954	0.9954	0.9954
0.0008	15.9664	1900	0.0159	0.9969	0.9968	0.9968	0.9968
0.0025	16.8067	2000	0.0205	0.9966	0.9963	0.9964	0.9964
0.0001	17.6471	2100	0.0203	0.9959	0.9961	0.9960	0.9960
0.0001	18.4874	2200	0.0188	0.9963	0.9961	0.9962	0.9962
0.0012	19.3277	2300	0.0194	0.9966	0.9966	0.9966	0.9966