zephyr-7b-sft-full

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9411

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
1.0618 0.0231 25 1.0578
1.0471 0.0461 50 1.0590
1.0447 0.0692 75 1.0612
1.0602 0.0923 100 1.0589
1.0717 0.1154 125 1.0559
1.0244 0.1384 150 1.0520
1.0251 0.1615 175 1.0483
1.0343 0.1846 200 1.0470
1.0441 0.2077 225 1.0421
1.0291 0.2307 250 1.0399
1.0243 0.2538 275 1.0374
1.0294 0.2769 300 1.0332
1.0263 0.3000 325 1.0300
1.0032 0.3230 350 1.0247
1.0178 0.3461 375 1.0214
0.9982 0.3692 400 1.0160
0.9965 0.3922 425 1.0127
1.0068 0.4153 450 1.0089
1.0027 0.4384 475 1.0054
1.0053 0.4615 500 1.0011
0.9706 0.4845 525 0.9964
0.9779 0.5076 550 0.9925
0.9693 0.5307 575 0.9883
0.9638 0.5538 600 0.9837
0.9599 0.5768 625 0.9799
0.971 0.5999 650 0.9759
0.9635 0.6230 675 0.9719
0.9341 0.6461 700 0.9680
0.9427 0.6691 725 0.9643
0.9404 0.6922 750 0.9608
0.934 0.7153 775 0.9575
0.9212 0.7383 800 0.9548
0.931 0.7614 825 0.9521
0.9325 0.7845 850 0.9499
0.9344 0.8076 875 0.9477
0.934 0.8306 900 0.9458
0.9369 0.8537 925 0.9443
0.9404 0.8768 950 0.9431
0.9174 0.8999 975 0.9422
0.9194 0.9229 1000 0.9416
0.931 0.9460 1025 0.9413
0.939 0.9691 1050 0.9411
0.928 0.9922 1075 0.9411

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+rocm6.2
  • Datasets 3.2.0
  • Tokenizers 0.20.3
Downloads last month
14
Safetensors
Model size
7.24B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for li-muyang/zephyr-7b-sft-full

Finetuned
(956)
this model
Quantizations
1 model