mistral-7b-expo-7b-L2EXPO-25-last-2
This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4606
- Objective: 0.4633
- Reward Accuracy: 0.6379
- Logp Accuracy: 0.5626
- Log Diff Policy: 3.1175
- Chosen Logps: -85.1361
- Rejected Logps: -88.2536
- Chosen Rewards: 0.4773
- Rejected Rewards: 0.3405
- Logits: -2.1479
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 108
- total_eval_batch_size: 9
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.584 | 0.0758 | 50 | 0.5099 | 0.5074 | 0.5355 | 0.5159 | 0.4483 | -92.8705 | -93.3187 | 0.0906 | 0.0873 | -2.1995 |
0.5893 | 0.1517 | 100 | 0.4987 | 0.4957 | 0.5761 | 0.5254 | 0.8809 | -93.2981 | -94.1790 | 0.0692 | 0.0443 | -2.2143 |
0.611 | 0.2275 | 150 | 0.4887 | 0.4867 | 0.5968 | 0.5350 | 1.5013 | -90.4754 | -91.9767 | 0.2103 | 0.1544 | -2.1755 |
0.5447 | 0.3033 | 200 | 0.4782 | 0.4772 | 0.6174 | 0.5447 | 1.8866 | -89.0557 | -90.9423 | 0.2813 | 0.2061 | -2.1720 |
0.5213 | 0.3792 | 250 | 0.4763 | 0.4778 | 0.6219 | 0.5529 | 2.3081 | -90.9116 | -93.2196 | 0.1885 | 0.0922 | -2.1856 |
0.5035 | 0.4550 | 300 | 0.4727 | 0.4743 | 0.6342 | 0.5543 | 2.7025 | -90.4785 | -93.1811 | 0.2102 | 0.0942 | -2.2327 |
0.5007 | 0.5308 | 350 | 0.4686 | 0.4695 | 0.6359 | 0.5543 | 2.6978 | -89.0034 | -91.7011 | 0.2839 | 0.1682 | -2.2195 |
0.4754 | 0.6067 | 400 | 0.4686 | 0.4713 | 0.6429 | 0.5584 | 2.9728 | -89.0680 | -92.0408 | 0.2807 | 0.1512 | -2.2204 |
0.4719 | 0.6825 | 450 | 0.4637 | 0.4657 | 0.6339 | 0.5596 | 2.9869 | -86.8977 | -89.8847 | 0.3892 | 0.2590 | -2.2168 |
0.5058 | 0.7583 | 500 | 0.4702 | 0.4725 | 0.6384 | 0.5587 | 3.1565 | -86.7702 | -89.9267 | 0.3956 | 0.2569 | -2.1938 |
0.4449 | 0.8342 | 550 | 0.4669 | 0.4706 | 0.6337 | 0.5629 | 3.1811 | -87.7497 | -90.9308 | 0.3466 | 0.2067 | -2.1924 |
0.4532 | 0.9100 | 600 | 0.4612 | 0.4644 | 0.6387 | 0.5635 | 3.0577 | -90.2947 | -93.3524 | 0.2194 | 0.0856 | -2.1974 |
0.4937 | 0.9858 | 650 | 0.4624 | 0.4653 | 0.6387 | 0.5621 | 3.1203 | -89.9034 | -93.0237 | 0.2389 | 0.1020 | -2.1556 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for hZzy/mistral-7b-expo-7b-L2EXPO-25-last-2
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3
Finetuned
hZzy/mistral-7b-sft-25-1