my_awesome_qa_model

This model is a fine-tuned version of google/muril-base-cased on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 30

Training Loss	Epoch	Step	Validation Loss
No log	1.0	87	3.3597
3.9557	2.0	174	2.1384
2.3734	3.0	261	1.3267
1.2469	4.0	348	1.0861
0.709	5.0	435	1.0629
0.4988	6.0	522	1.6941
0.3718	7.0	609	1.3660
0.3718	8.0	696	2.0104
0.2292	9.0	783	2.1057
0.1848	10.0	870	2.1225
0.1241	11.0	957	1.8473
0.1352	12.0	1044	1.5934
0.0767	13.0	1131	1.7822
0.0589	14.0	1218	1.9077
0.0502	15.0	1305	1.9062
0.0502	16.0	1392	1.9073
0.0559	17.0	1479	1.9963
0.0441	18.0	1566	1.7880
0.0296	19.0	1653	2.3304
0.0204	20.0	1740	2.3634
0.0165	21.0	1827	2.1404
0.0152	22.0	1914	1.8899
0.01	23.0	2001	2.0763
0.01	24.0	2088	2.2466
0.0079	25.0	2175	2.2306
0.0072	26.0	2262	2.2067
0.0128	27.0	2349	2.2512
0.0113	28.0	2436	2.2725
0.0062	29.0	2523	2.2457
0.0064	30.0	2610	2.2420

Safetensors

Model size

237M params

Tensor type

F32

Base model

Finetuned

(36)

this model