datamix-treatment-12l

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 6.6002

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
7.3611 0.1010 500 7.2894
7.1006 0.2020 1000 7.0496
6.8657 0.3030 1500 6.9105
6.9066 0.4040 2000 6.8204
6.7658 0.5051 2500 6.7608
6.8033 0.6061 3000 6.6947
6.6107 0.7071 3500 6.6542
6.6699 0.8081 4000 6.6237
6.6169 0.9091 4500 6.6002

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.1
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
209M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Yanjo/datamix-treatment-12l

Quantizations
2 models