mistral-7b-expo-7b-L2EXPO-25-smallr-1
This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4531
- Objective: 0.4545
- Reward Accuracy: 0.6563
- Logp Accuracy: 0.6493
- Log Diff Policy: 15.4503
- Chosen Logps: -148.5083
- Rejected Logps: -163.9586
- Chosen Rewards: -0.5383
- Rejected Rewards: -0.6889
- Logits: -2.1895
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 108
- total_eval_batch_size: 9
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5866 | 0.0758 | 50 | 0.5114 | 0.5084 | 0.5481 | 0.5168 | 0.4644 | -93.1551 | -93.6196 | 0.0153 | 0.0144 | -2.2005 |
0.6029 | 0.1517 | 100 | 0.5040 | 0.5011 | 0.5741 | 0.5316 | 1.3657 | -93.8686 | -95.2344 | 0.0081 | -0.0017 | -2.1831 |
0.6165 | 0.2275 | 150 | 0.4877 | 0.4856 | 0.5970 | 0.5741 | 5.4287 | -98.6006 | -104.0293 | -0.0392 | -0.0896 | -2.0756 |
0.5324 | 0.3033 | 200 | 0.4748 | 0.4791 | 0.6172 | 0.6110 | 9.8505 | -116.5340 | -126.3844 | -0.2185 | -0.3132 | -2.1418 |
0.5089 | 0.3792 | 250 | 0.4679 | 0.4712 | 0.6306 | 0.6222 | 11.0787 | -118.1832 | -129.2619 | -0.2350 | -0.3420 | -2.2452 |
0.5254 | 0.4550 | 300 | 0.4669 | 0.4693 | 0.6479 | 0.6387 | 14.2479 | -134.9546 | -149.2025 | -0.4027 | -0.5414 | -2.1789 |
0.4904 | 0.5308 | 350 | 0.4571 | 0.4582 | 0.6477 | 0.6423 | 12.9700 | -138.0092 | -150.9792 | -0.4333 | -0.5591 | -2.2293 |
0.4722 | 0.6067 | 400 | 0.4556 | 0.4563 | 0.6521 | 0.6479 | 13.8030 | -127.5593 | -141.3622 | -0.3288 | -0.4630 | -2.2377 |
0.4716 | 0.6825 | 450 | 0.4574 | 0.4604 | 0.6518 | 0.6443 | 15.1329 | -157.4561 | -172.5890 | -0.6277 | -0.7752 | -2.1945 |
0.5051 | 0.7583 | 500 | 0.4571 | 0.4591 | 0.6535 | 0.6513 | 15.8245 | -148.2936 | -164.1181 | -0.5361 | -0.6905 | -2.2074 |
0.4423 | 0.8342 | 550 | 0.4539 | 0.4550 | 0.6527 | 0.6513 | 15.3717 | -145.5679 | -160.9395 | -0.5089 | -0.6588 | -2.2040 |
0.465 | 0.9100 | 600 | 0.4529 | 0.4543 | 0.6549 | 0.6485 | 15.3658 | -148.1466 | -163.5124 | -0.5346 | -0.6845 | -2.1926 |
0.5092 | 0.9858 | 650 | 0.4531 | 0.4545 | 0.6541 | 0.6490 | 15.4559 | -148.5047 | -163.9607 | -0.5382 | -0.6890 | -2.1898 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for hZzy/mistral-7b-expo-7b-L2EXPO-25-smallr-1
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3
Finetuned
hZzy/mistral-7b-sft-25-1