mistral_logical_3k_data

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 7
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.0583	0.2963	50	0.0501
0.0409	0.5926	100	0.0409
0.1111	0.8889	150	0.1679
0.1588	1.1896	200	0.0498
0.2175	1.4859	250	0.0589
0.1318	1.7822	300	0.3170
0.0651	2.0830	350	0.0869
0.0707	2.3793	400	0.0603
0.0693	2.6756	450	0.0518
0.0467	2.9719	500	0.0475
0.0422	3.2726	550	0.0411
0.0379	3.5689	600	0.0395
0.0386	3.8652	650	0.0392
0.038	4.1659	700	0.0384
0.038	4.4622	750	0.0383
0.0364	4.7585	800	0.0380
0.0396	5.0593	850	0.0377
0.0361	5.3556	900	0.0375
0.0372	5.6519	950	0.0374
0.0375	5.9481	1000	0.0374
0.0375	6.2489	1050	0.0373
0.0364	6.5452	1100	0.0373
0.0365	6.8415	1150	0.0373