mistral-7b-expo-7b-L2EXPO-25-cos-1
This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4434
- Objective: 0.4453
- Reward Accuracy: 0.6664
- Logp Accuracy: 0.6586
- Log Diff Policy: 16.9539
- Chosen Logps: -166.9353
- Rejected Logps: -183.8891
- Chosen Rewards: -0.7225
- Rejected Rewards: -0.8882
- Logits: -2.1863
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 108
- total_eval_batch_size: 9
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5848 | 0.0758 | 50 | 0.5104 | 0.5075 | 0.5501 | 0.5176 | 0.5694 | -90.5345 | -91.1039 | 0.0415 | 0.0396 | -2.1936 |
0.5838 | 0.1517 | 100 | 0.4946 | 0.4919 | 0.5819 | 0.5456 | 3.0042 | -108.4590 | -111.4633 | -0.1378 | -0.1640 | -2.1922 |
0.5739 | 0.2275 | 150 | 0.4730 | 0.4743 | 0.6328 | 0.6079 | 9.5035 | -140.8261 | -150.3296 | -0.4614 | -0.5527 | -2.0682 |
0.5202 | 0.3033 | 200 | 0.4668 | 0.4686 | 0.6365 | 0.6239 | 12.1596 | -131.0880 | -143.2477 | -0.3641 | -0.4818 | -2.1371 |
0.485 | 0.3792 | 250 | 0.4593 | 0.4595 | 0.6446 | 0.6317 | 13.0801 | -119.5203 | -132.6004 | -0.2484 | -0.3754 | -2.0672 |
0.4961 | 0.4550 | 300 | 0.4573 | 0.4597 | 0.6619 | 0.6602 | 17.1219 | -161.8232 | -178.9452 | -0.6714 | -0.8388 | -2.1184 |
0.4719 | 0.5308 | 350 | 0.4516 | 0.4534 | 0.6641 | 0.6538 | 16.2662 | -161.6451 | -177.9113 | -0.6696 | -0.8285 | -2.1018 |
0.4431 | 0.6067 | 400 | 0.4464 | 0.4470 | 0.6622 | 0.6594 | 16.4637 | -136.6935 | -153.1571 | -0.4201 | -0.5809 | -2.2011 |
0.4562 | 0.6825 | 450 | 0.4448 | 0.4470 | 0.6625 | 0.6558 | 17.0393 | -156.8561 | -173.8954 | -0.6217 | -0.7883 | -2.1870 |
0.4779 | 0.7583 | 500 | 0.4508 | 0.4536 | 0.6647 | 0.6580 | 18.3597 | -167.1661 | -185.5258 | -0.7248 | -0.9046 | -2.1974 |
0.4289 | 0.8342 | 550 | 0.4453 | 0.4474 | 0.6628 | 0.6580 | 17.1657 | -159.0477 | -176.2134 | -0.6437 | -0.8115 | -2.1877 |
0.4413 | 0.9100 | 600 | 0.4430 | 0.4451 | 0.6664 | 0.6572 | 16.7940 | -166.5301 | -183.3241 | -0.7185 | -0.8826 | -2.1875 |
0.4902 | 0.9858 | 650 | 0.4435 | 0.4456 | 0.6658 | 0.6574 | 16.9345 | -167.0006 | -183.9351 | -0.7232 | -0.8887 | -2.1872 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for hZzy/mistral-7b-expo-7b-L2EXPO-25-cos-1
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3
Finetuned
hZzy/mistral-7b-sft-25-1