Visualize in Weights & Biases

mistral-7b-expo-7b-IPO-25-final-1

This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:

  • Loss: 40.7478
  • Objective: 41.4189
  • Reward Accuracy: 0.6560
  • Logp Accuracy: 0.6449
  • Log Diff Policy: 12.9557
  • Chosen Logps: -164.9055
  • Rejected Logps: -177.8611
  • Chosen Rewards: -0.7022
  • Rejected Rewards: -0.8280
  • Logits: -1.8762

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 3
  • eval_batch_size: 3
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 108
  • total_eval_batch_size: 9
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Objective Reward Accuracy Logp Accuracy Log Diff Policy Chosen Logps Rejected Logps Chosen Rewards Rejected Rewards Logits
49.9605 0.1213 80 49.9610 49.9539 0.5350 0.5176 0.4283 -93.8819 -94.3102 0.0080 0.0075 -2.2018
49.4535 0.2427 160 49.5361 49.5283 0.5593 0.5249 0.8538 -90.2269 -91.0807 0.0446 0.0398 -2.1641
44.982 0.3640 240 45.8690 46.1325 0.5808 0.5624 4.6125 -117.8028 -122.4154 -0.2312 -0.2735 -1.8757
41.1334 0.4853 320 43.5046 43.8982 0.6105 0.6032 7.8409 -127.0870 -134.9279 -0.3240 -0.3986 -1.8141
39.7268 0.6067 400 42.7600 43.1496 0.6334 0.6214 10.4767 -125.7633 -136.2400 -0.3108 -0.4118 -1.8837
39.7663 0.7280 480 41.7225 42.2563 0.6435 0.6390 10.9695 -128.9929 -139.9624 -0.3431 -0.4490 -2.0056
39.5465 0.8493 560 41.3188 41.8171 0.6521 0.6345 10.6324 -140.6873 -151.3197 -0.4601 -0.5626 -1.9821
40.4391 0.9707 640 41.1982 41.9165 0.6482 0.6471 12.5170 -147.6488 -160.1658 -0.5297 -0.6510 -1.9380
38.5771 1.0920 720 41.2521 41.9791 0.6563 0.6337 12.3195 -134.2294 -146.5489 -0.3955 -0.5148 -1.8990
37.5887 1.2133 800 41.1525 41.8777 0.6510 0.6334 11.9934 -137.1610 -149.1544 -0.4248 -0.5409 -1.9650
38.8787 1.3347 880 40.8906 41.4724 0.6541 0.6393 12.1420 -143.8819 -156.0239 -0.4920 -0.6096 -1.9757
36.5702 1.4560 960 40.9781 41.5046 0.6555 0.6423 12.6362 -130.7231 -143.3593 -0.3604 -0.4829 -1.9758
35.9143 1.5774 1040 40.9837 41.6091 0.6502 0.6432 12.7345 -136.4761 -149.2107 -0.4179 -0.5415 -1.9899
36.9408 1.6987 1120 40.8692 41.4085 0.6555 0.6376 12.1356 -152.8772 -165.0128 -0.5819 -0.6995 -1.7977
36.6248 1.8200 1200 40.5552 41.1028 0.6608 0.6437 12.9051 -143.6405 -156.5456 -0.4896 -0.6148 -1.8976
36.0414 1.9414 1280 40.8334 41.4863 0.6541 0.6395 12.5224 -151.9377 -164.4601 -0.5726 -0.6940 -1.8392

Framework versions

  • PEFT 0.11.1
  • Transformers 4.42.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hZzy/mistral-7b-expo-7b-IPO-25-final-1

Dataset used to train hZzy/mistral-7b-expo-7b-IPO-25-final-1