mistral-7b-expo-7b-L2EXPO-25-5
This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4509
- Objective: 0.4583
- Reward Accuracy: 0.6600
- Logp Accuracy: 0.6477
- Log Diff Policy: 14.9476
- Chosen Logps: -137.8466
- Rejected Logps: -152.7942
- Chosen Rewards: -0.4362
- Rejected Rewards: -0.5749
- Logits: -2.0296
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 216
- total_eval_batch_size: 18
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.616 | 0.2275 | 75 | 0.5055 | 0.5089 | 0.5727 | 0.5375 | 1.8142 | -89.7217 | -91.5358 | 0.0450 | 0.0377 | -2.1516 |
0.5634 | 0.4550 | 150 | 0.4844 | 0.4942 | 0.5906 | 0.5889 | 7.4177 | -111.7094 | -119.1271 | -0.1748 | -0.2383 | -1.9188 |
0.5078 | 0.6825 | 225 | 0.4643 | 0.4770 | 0.6219 | 0.6180 | 11.5717 | -139.8274 | -151.3991 | -0.4560 | -0.5610 | -2.1061 |
0.4821 | 0.9100 | 300 | 0.4602 | 0.4685 | 0.6538 | 0.6465 | 15.8975 | -137.3998 | -153.2973 | -0.4317 | -0.5800 | -2.0229 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for hZzy/mistral-7b-expo-7b-L2EXPO-25-5
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3
Finetuned
hZzy/mistral-7b-sft-25-1