mistral-7b-expo-7b-L2EXPO-25-last-3
This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4652
- Objective: 0.4665
- Reward Accuracy: 0.6468
- Logp Accuracy: 0.5380
- Log Diff Policy: 1.7463
- Chosen Logps: -88.9876
- Rejected Logps: -90.7340
- Chosen Rewards: 0.5695
- Rejected Rewards: 0.4330
- Logits: -2.1608
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 108
- total_eval_batch_size: 9
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5816 | 0.0758 | 50 | 0.5092 | 0.5064 | 0.5489 | 0.5176 | 0.4504 | -93.1500 | -93.6004 | 0.1532 | 0.1464 | -2.1905 |
0.5803 | 0.1517 | 100 | 0.4981 | 0.4935 | 0.5710 | 0.5246 | 0.7658 | -94.0984 | -94.8642 | 0.0584 | 0.0200 | -2.2166 |
0.6056 | 0.2275 | 150 | 0.4821 | 0.4811 | 0.6035 | 0.5280 | 1.0402 | -92.9769 | -94.0170 | 0.1705 | 0.1047 | -2.2026 |
0.5299 | 0.3033 | 200 | 0.4781 | 0.4783 | 0.6177 | 0.5338 | 1.2448 | -91.3817 | -92.6265 | 0.3301 | 0.2438 | -2.2070 |
0.5156 | 0.3792 | 250 | 0.4757 | 0.4785 | 0.6205 | 0.5352 | 1.3596 | -92.2695 | -93.6291 | 0.2413 | 0.1435 | -2.2315 |
0.5013 | 0.4550 | 300 | 0.4743 | 0.4760 | 0.6312 | 0.5322 | 1.5243 | -91.0031 | -92.5274 | 0.3679 | 0.2537 | -2.2311 |
0.4959 | 0.5308 | 350 | 0.4681 | 0.4693 | 0.6337 | 0.5333 | 1.5031 | -90.2225 | -91.7256 | 0.4460 | 0.3339 | -2.2133 |
0.4667 | 0.6067 | 400 | 0.4647 | 0.4667 | 0.6395 | 0.5358 | 1.6181 | -91.8421 | -93.4602 | 0.2840 | 0.1604 | -2.1876 |
0.4661 | 0.6825 | 450 | 0.4663 | 0.4689 | 0.6298 | 0.5330 | 1.6059 | -90.0967 | -91.7026 | 0.4586 | 0.3362 | -2.1883 |
0.5 | 0.7583 | 500 | 0.4699 | 0.4724 | 0.6306 | 0.5361 | 1.6815 | -87.4541 | -89.1356 | 0.7228 | 0.5929 | -2.1850 |
0.4319 | 0.8342 | 550 | 0.4681 | 0.4718 | 0.6267 | 0.5366 | 1.7006 | -88.2031 | -89.9036 | 0.6479 | 0.5161 | -2.1868 |
0.4536 | 0.9100 | 600 | 0.4632 | 0.4665 | 0.6278 | 0.5358 | 1.6002 | -89.8265 | -91.4267 | 0.4856 | 0.3638 | -2.1747 |
0.4925 | 0.9858 | 650 | 0.4657 | 0.4683 | 0.6309 | 0.5380 | 1.7545 | -91.7867 | -93.5412 | 0.2896 | 0.1523 | -2.1635 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for hZzy/mistral-7b-expo-7b-L2EXPO-25-last-3
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3
Finetuned
hZzy/mistral-7b-sft-25-1