mistral-7b-expo-7b-L2EXPO-25-last-1
This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4397
- Objective: 0.4403
- Reward Accuracy: 0.6644
- Logp Accuracy: 0.6488
- Log Diff Policy: 15.5945
- Chosen Logps: -164.8095
- Rejected Logps: -180.4040
- Chosen Rewards: -0.7013
- Rejected Rewards: -0.8534
- Logits: -2.0380
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 108
- total_eval_batch_size: 9
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5865 | 0.0758 | 50 | 0.5115 | 0.5085 | 0.5296 | 0.5176 | 0.4580 | -92.9565 | -93.4145 | 0.0173 | 0.0165 | -2.2007 |
0.6073 | 0.1517 | 100 | 0.5074 | 0.5043 | 0.5671 | 0.5271 | 0.9476 | -93.0270 | -93.9747 | 0.0166 | 0.0109 | -2.2026 |
0.627 | 0.2275 | 150 | 0.4960 | 0.4934 | 0.5808 | 0.5506 | 2.8875 | -94.9425 | -97.8300 | -0.0026 | -0.0277 | -2.0708 |
0.5403 | 0.3033 | 200 | 0.4882 | 0.4891 | 0.5920 | 0.5909 | 7.7676 | -112.2413 | -120.0089 | -0.1756 | -0.2494 | -1.9125 |
0.5335 | 0.3792 | 250 | 0.4786 | 0.4788 | 0.6093 | 0.6035 | 9.3911 | -115.1676 | -124.5586 | -0.2049 | -0.2949 | -1.9601 |
0.5282 | 0.4550 | 300 | 0.4665 | 0.4686 | 0.6376 | 0.6306 | 12.4430 | -143.1194 | -155.5625 | -0.4844 | -0.6050 | -1.9915 |
0.4989 | 0.5308 | 350 | 0.4669 | 0.4697 | 0.6370 | 0.6323 | 14.0382 | -141.6395 | -155.6777 | -0.4696 | -0.6061 | -2.1328 |
0.4806 | 0.6067 | 400 | 0.4599 | 0.4608 | 0.6493 | 0.6370 | 14.2207 | -129.4153 | -143.6360 | -0.3473 | -0.4857 | -2.1904 |
0.4752 | 0.6825 | 450 | 0.4531 | 0.4555 | 0.6546 | 0.6454 | 15.4063 | -148.1931 | -163.5994 | -0.5351 | -0.6853 | -2.1390 |
0.4943 | 0.7583 | 500 | 0.4544 | 0.4545 | 0.6572 | 0.6535 | 17.1657 | -148.8972 | -166.0628 | -0.5421 | -0.7100 | -2.1458 |
0.4309 | 0.8342 | 550 | 0.4602 | 0.4618 | 0.6577 | 0.6463 | 17.9948 | -158.6875 | -176.6823 | -0.6401 | -0.8162 | -2.1403 |
0.4533 | 0.9100 | 600 | 0.4523 | 0.4541 | 0.6636 | 0.6521 | 17.8286 | -180.8165 | -198.6450 | -0.8613 | -1.0358 | -2.1089 |
0.4929 | 0.9858 | 650 | 0.4402 | 0.4397 | 0.6709 | 0.6530 | 15.6406 | -138.0694 | -153.7100 | -0.4339 | -0.5865 | -2.0532 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for hZzy/mistral-7b-expo-7b-L2EXPO-25-last-1
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3
Finetuned
hZzy/mistral-7b-sft-25-1