mistral-7b-expo-7b-L2EXPO-25-final-2
This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4637
- Objective: 0.4623
- Reward Accuracy: 0.6510
- Logp Accuracy: 0.5705
- Log Diff Policy: 3.5252
- Chosen Logps: -91.4483
- Rejected Logps: -94.9735
- Chosen Rewards: 0.1617
- Rejected Rewards: 0.0045
- Logits: -2.0668
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 108
- total_eval_batch_size: 9
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6143 | 0.1213 | 80 | 0.5109 | 0.5074 | 0.5414 | 0.5154 | 0.4308 | -93.9032 | -94.3340 | 0.0390 | 0.0365 | -2.2003 |
0.5624 | 0.2427 | 160 | 0.5040 | 0.5009 | 0.5713 | 0.5210 | 0.7341 | -92.2105 | -92.9446 | 0.1236 | 0.1060 | -2.1577 |
0.5296 | 0.3640 | 240 | 0.4863 | 0.4843 | 0.6079 | 0.5316 | 1.4806 | -93.1361 | -94.6168 | 0.0773 | 0.0224 | -2.1544 |
0.5062 | 0.4853 | 320 | 0.4769 | 0.4758 | 0.6295 | 0.5467 | 2.3729 | -88.3420 | -90.7150 | 0.3170 | 0.2175 | -2.2017 |
0.489 | 0.6067 | 400 | 0.4707 | 0.4703 | 0.6418 | 0.5576 | 2.7726 | -89.4151 | -92.1878 | 0.2634 | 0.1438 | -2.2280 |
0.4965 | 0.7280 | 480 | 0.4679 | 0.4692 | 0.6331 | 0.5610 | 2.8915 | -89.0811 | -91.9726 | 0.2801 | 0.1546 | -2.2266 |
0.4905 | 0.8493 | 560 | 0.4693 | 0.4712 | 0.6390 | 0.5607 | 3.1056 | -91.1217 | -94.2273 | 0.1780 | 0.0419 | -2.2026 |
0.4547 | 0.9707 | 640 | 0.4653 | 0.4671 | 0.6353 | 0.5624 | 3.1047 | -92.8997 | -96.0044 | 0.0891 | -0.0470 | -2.1916 |
0.4497 | 1.0920 | 720 | 0.4672 | 0.4683 | 0.6404 | 0.5660 | 3.3740 | -84.0528 | -87.4268 | 0.5315 | 0.3819 | -2.1664 |
0.4358 | 1.2133 | 800 | 0.4617 | 0.4629 | 0.6398 | 0.5629 | 3.2392 | -85.6140 | -88.8532 | 0.4534 | 0.3106 | -2.1284 |
0.4572 | 1.3347 | 880 | 0.4665 | 0.4689 | 0.6398 | 0.5682 | 3.3859 | -89.4141 | -92.8000 | 0.2634 | 0.1132 | -2.1583 |
0.4362 | 1.4560 | 960 | 0.4647 | 0.4657 | 0.6395 | 0.5702 | 3.5918 | -86.3147 | -89.9066 | 0.4184 | 0.2579 | -2.1314 |
0.3976 | 1.5774 | 1040 | 0.4635 | 0.4660 | 0.6365 | 0.5685 | 3.5224 | -88.0452 | -91.5676 | 0.3319 | 0.1748 | -2.1032 |
0.4082 | 1.6987 | 1120 | 0.4628 | 0.4651 | 0.6429 | 0.5707 | 3.4228 | -85.0947 | -88.5175 | 0.4794 | 0.3273 | -2.0848 |
0.4037 | 1.8200 | 1200 | 0.4621 | 0.4621 | 0.6449 | 0.5775 | 3.6505 | -93.7257 | -97.3762 | 0.0478 | -0.1156 | -2.0348 |
0.3961 | 1.9414 | 1280 | 0.4607 | 0.4613 | 0.6418 | 0.5744 | 3.4066 | -90.0586 | -93.4653 | 0.2312 | 0.0800 | -2.0810 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for hZzy/mistral-7b-expo-7b-L2EXPO-25-final-2
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3
Finetuned
hZzy/mistral-7b-sft-25-1