english-to-hindi-colloquial

This model is a fine-tuned version of unsloth/tinyllama-chat-bnb-4bit on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
training_steps: 60
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
14.6625	0.0101	2	12.2272
14.5186	0.0203	4	12.2272
14.505	0.0304	6	12.2272
14.5919	0.0405	8	12.2272
14.2832	0.0506	10	12.2272
14.3709	0.0608	12	12.2272
14.4893	0.0709	14	12.2272
14.5328	0.0810	16	12.2468
10.6599	0.0911	18	12.2062
9.7087	0.1013	20	12.1602
14.2948	0.1114	22	12.0693
8.374	0.1215	24	11.8797
6.0879	0.1316	26	11.7536
5.3437	0.1418	28	11.7127
4.8764	0.1519	30	11.7571
4.4624	0.1620	32	11.7121
4.2256	0.1722	34	11.6351
4.1264	0.1823	36	11.5314
3.9569	0.1924	38	11.5689
3.8794	0.2025	40	11.6288
3.8159	0.2127	42	11.7024
3.7736	0.2228	44	11.7233
3.7625	0.2329	46	11.7436
3.6893	0.2430	48	11.7509
3.7297	0.2532	50	11.7687
3.6863	0.2633	52	11.8390
3.7159	0.2734	54	11.8839
3.6132	0.2835	56	11.9097
3.6394	0.2937	58	11.9677
3.7257	0.3038	60	12.0502