mistral-7b-expo-7b-IPO-25-1
This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:
- Loss: 40.9160
- Objective: 41.5373
- Reward Accuracy: 0.6597
- Logp Accuracy: 0.6415
- Log Diff Policy: 10.9617
- Chosen Logps: -133.3625
- Rejected Logps: -144.3242
- Chosen Rewards: -0.3868
- Rejected Rewards: -0.4926
- Logits: -2.0550
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 108
- total_eval_batch_size: 9
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
49.3588 | 0.1517 | 100 | 49.4313 | 49.4074 | 0.5652 | 0.5271 | 0.9748 | -92.4140 | -93.3888 | 0.0227 | 0.0168 | -2.1900 |
42.9865 | 0.3033 | 200 | 44.3119 | 44.8376 | 0.5965 | 0.5867 | 7.1631 | -121.9410 | -129.1041 | -0.2726 | -0.3404 | -1.9037 |
42.5687 | 0.4550 | 300 | 42.4967 | 42.9706 | 0.6356 | 0.6172 | 8.8037 | -137.1864 | -145.9901 | -0.4250 | -0.5093 | -2.0350 |
38.5474 | 0.6067 | 400 | 41.9759 | 42.5688 | 0.6435 | 0.6211 | 10.6485 | -109.0894 | -119.7378 | -0.1441 | -0.2467 | -2.0654 |
38.779 | 0.7583 | 500 | 41.2348 | 41.8556 | 0.6544 | 0.6404 | 12.3982 | -125.9650 | -138.3632 | -0.3128 | -0.4330 | -2.1536 |
38.0359 | 0.9100 | 600 | 40.8263 | 41.3470 | 0.6630 | 0.6502 | 11.9953 | -132.4066 | -144.4019 | -0.3772 | -0.4934 | -2.0724 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for hZzy/mistral-7b-expo-7b-IPO-25-1
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3
Finetuned
hZzy/mistral-7b-sft-25-1