mistral-7b-expo-7b-L2EXPO-25-7
This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4452
- Objective: 0.4429
- Reward Accuracy: 0.6574
- Logp Accuracy: 0.6451
- Log Diff Policy: 17.7719
- Chosen Logps: -193.5036
- Rejected Logps: -211.2754
- Chosen Rewards: -0.9882
- Rejected Rewards: -1.1621
- Logits: -1.8933
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 108
- total_eval_batch_size: 9
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6123 | 0.1517 | 100 | 0.5111 | 0.5082 | 0.5386 | 0.5193 | 0.4932 | -92.5590 | -93.0521 | 0.0212 | 0.0201 | -2.1823 |
0.586 | 0.3033 | 200 | 0.4950 | 0.4930 | 0.5886 | 0.5540 | 2.7175 | -94.1884 | -96.9058 | 0.0049 | -0.0184 | -2.0328 |
0.5397 | 0.4550 | 300 | 0.4754 | 0.4737 | 0.6337 | 0.6169 | 10.7924 | -129.7178 | -140.5102 | -0.3504 | -0.4545 | -1.8133 |
0.492 | 0.6067 | 400 | 0.4631 | 0.4616 | 0.6485 | 0.6401 | 13.2720 | -123.6336 | -136.9056 | -0.2895 | -0.4184 | -2.1057 |
0.5057 | 0.7583 | 500 | 0.4569 | 0.4552 | 0.6569 | 0.6530 | 16.0166 | -157.9163 | -173.9330 | -0.6323 | -0.7887 | -2.0772 |
0.4562 | 0.9100 | 600 | 0.4556 | 0.4556 | 0.6683 | 0.6639 | 17.9609 | -185.6944 | -203.6553 | -0.9101 | -1.0859 | -2.0650 |
0.4627 | 1.0617 | 700 | 0.4463 | 0.4468 | 0.6683 | 0.6580 | 17.2045 | -193.8147 | -211.0192 | -0.9913 | -1.1595 | -2.0785 |
0.4675 | 1.2133 | 800 | 0.4477 | 0.4481 | 0.6667 | 0.6586 | 18.6212 | -171.0579 | -189.6790 | -0.7638 | -0.9461 | -2.0805 |
0.4631 | 1.3650 | 900 | 0.4523 | 0.4507 | 0.6697 | 0.6614 | 19.3812 | -184.6346 | -204.0157 | -0.8995 | -1.0895 | -2.1364 |
0.4706 | 1.5167 | 1000 | 0.4428 | 0.4430 | 0.6636 | 0.6535 | 18.2421 | -175.5473 | -193.7894 | -0.8087 | -0.9872 | -2.0802 |
0.4404 | 1.6684 | 1100 | 0.4509 | 0.4525 | 0.6644 | 0.6544 | 19.2144 | -205.9645 | -225.1790 | -1.1128 | -1.3011 | -2.0013 |
0.4086 | 1.8200 | 1200 | 0.4418 | 0.4425 | 0.6742 | 0.6611 | 20.1585 | -203.0412 | -223.1997 | -1.0836 | -1.2814 | -2.0214 |
0.4211 | 1.9717 | 1300 | 0.4377 | 0.4377 | 0.6636 | 0.6488 | 17.4303 | -204.1001 | -221.5304 | -1.0942 | -1.2647 | -1.9872 |
0.3854 | 2.1234 | 1400 | 0.4415 | 0.4413 | 0.6616 | 0.6521 | 17.5702 | -221.2297 | -238.7999 | -1.2655 | -1.4374 | -1.9931 |
0.4044 | 2.2750 | 1500 | 0.4484 | 0.4486 | 0.6644 | 0.6527 | 20.0957 | -200.4279 | -220.5237 | -1.0575 | -1.2546 | -1.9230 |
0.4357 | 2.4267 | 1600 | 0.4485 | 0.4484 | 0.6703 | 0.6600 | 20.6916 | -175.1429 | -195.8344 | -0.8046 | -1.0077 | -1.9267 |
0.4092 | 2.5784 | 1700 | 0.4627 | 0.4633 | 0.6641 | 0.6555 | 22.5421 | -193.3455 | -215.8876 | -0.9866 | -1.2082 | -1.8966 |
0.4004 | 2.7300 | 1800 | 0.4512 | 0.4522 | 0.6625 | 0.6516 | 19.5557 | -202.4462 | -222.0019 | -1.0776 | -1.2694 | -1.8232 |
0.3783 | 2.8817 | 1900 | 0.4530 | 0.4514 | 0.6630 | 0.6485 | 20.0630 | -199.5361 | -219.5991 | -1.0485 | -1.2453 | -1.8925 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 16
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for hZzy/mistral-7b-expo-7b-L2EXPO-25-7
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3
Finetuned
hZzy/mistral-7b-sft-25-1