mistral-7b-expo-7b-IPO-25-final-1
This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:
- Loss: 40.7478
- Objective: 41.4189
- Reward Accuracy: 0.6560
- Logp Accuracy: 0.6449
- Log Diff Policy: 12.9557
- Chosen Logps: -164.9055
- Rejected Logps: -177.8611
- Chosen Rewards: -0.7022
- Rejected Rewards: -0.8280
- Logits: -1.8762
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 108
- total_eval_batch_size: 9
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
49.9605 | 0.1213 | 80 | 49.9610 | 49.9539 | 0.5350 | 0.5176 | 0.4283 | -93.8819 | -94.3102 | 0.0080 | 0.0075 | -2.2018 |
49.4535 | 0.2427 | 160 | 49.5361 | 49.5283 | 0.5593 | 0.5249 | 0.8538 | -90.2269 | -91.0807 | 0.0446 | 0.0398 | -2.1641 |
44.982 | 0.3640 | 240 | 45.8690 | 46.1325 | 0.5808 | 0.5624 | 4.6125 | -117.8028 | -122.4154 | -0.2312 | -0.2735 | -1.8757 |
41.1334 | 0.4853 | 320 | 43.5046 | 43.8982 | 0.6105 | 0.6032 | 7.8409 | -127.0870 | -134.9279 | -0.3240 | -0.3986 | -1.8141 |
39.7268 | 0.6067 | 400 | 42.7600 | 43.1496 | 0.6334 | 0.6214 | 10.4767 | -125.7633 | -136.2400 | -0.3108 | -0.4118 | -1.8837 |
39.7663 | 0.7280 | 480 | 41.7225 | 42.2563 | 0.6435 | 0.6390 | 10.9695 | -128.9929 | -139.9624 | -0.3431 | -0.4490 | -2.0056 |
39.5465 | 0.8493 | 560 | 41.3188 | 41.8171 | 0.6521 | 0.6345 | 10.6324 | -140.6873 | -151.3197 | -0.4601 | -0.5626 | -1.9821 |
40.4391 | 0.9707 | 640 | 41.1982 | 41.9165 | 0.6482 | 0.6471 | 12.5170 | -147.6488 | -160.1658 | -0.5297 | -0.6510 | -1.9380 |
38.5771 | 1.0920 | 720 | 41.2521 | 41.9791 | 0.6563 | 0.6337 | 12.3195 | -134.2294 | -146.5489 | -0.3955 | -0.5148 | -1.8990 |
37.5887 | 1.2133 | 800 | 41.1525 | 41.8777 | 0.6510 | 0.6334 | 11.9934 | -137.1610 | -149.1544 | -0.4248 | -0.5409 | -1.9650 |
38.8787 | 1.3347 | 880 | 40.8906 | 41.4724 | 0.6541 | 0.6393 | 12.1420 | -143.8819 | -156.0239 | -0.4920 | -0.6096 | -1.9757 |
36.5702 | 1.4560 | 960 | 40.9781 | 41.5046 | 0.6555 | 0.6423 | 12.6362 | -130.7231 | -143.3593 | -0.3604 | -0.4829 | -1.9758 |
35.9143 | 1.5774 | 1040 | 40.9837 | 41.6091 | 0.6502 | 0.6432 | 12.7345 | -136.4761 | -149.2107 | -0.4179 | -0.5415 | -1.9899 |
36.9408 | 1.6987 | 1120 | 40.8692 | 41.4085 | 0.6555 | 0.6376 | 12.1356 | -152.8772 | -165.0128 | -0.5819 | -0.6995 | -1.7977 |
36.6248 | 1.8200 | 1200 | 40.5552 | 41.1028 | 0.6608 | 0.6437 | 12.9051 | -143.6405 | -156.5456 | -0.4896 | -0.6148 | -1.8976 |
36.0414 | 1.9414 | 1280 | 40.8334 | 41.4863 | 0.6541 | 0.6395 | 12.5224 | -151.9377 | -164.4601 | -0.5726 | -0.6940 | -1.8392 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for hZzy/mistral-7b-expo-7b-IPO-25-final-1
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3
Finetuned
hZzy/mistral-7b-sft-25-1