myBit-Llama2-jp-127M-8

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0024
train_batch_size: 96
eval_batch_size: 96
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 5000
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss
4.6815	0.0491	2000	3.6940
3.5196	0.0982	4000	3.4577
3.374	0.1473	6000	3.3326
3.2643	0.1964	8000	3.2583
3.2096	0.2455	10000	3.2133
3.1709	0.2946	12000	3.1826
3.1461	0.3438	14000	3.1628
3.1266	0.3929	16000	3.1457
3.1093	0.4420	18000	3.1261
3.0896	0.4911	20000	3.1057
3.0702	0.5402	22000	3.0891
3.0547	0.5893	24000	3.0700
3.0348	0.6384	26000	3.0514
3.0133	0.6875	28000	3.0276
2.9918	0.7366	30000	3.0044
2.9631	0.7857	32000	2.9765
2.9348	0.8348	34000	2.9463
2.9032	0.8839	36000	2.9124
2.8677	0.9330	38000	2.8701
2.82	0.9821	40000	2.8181