--- base_model: hZzy/mistral-7b-sft-25-1 datasets: - hZzy/direction_right2 library_name: peft license: apache-2.0 tags: - alignment-handbook - ndcg - trl - expo - generated_from_trainer model-index: - name: mistral-7b-expo-7b-IPO-25-2 results: [] --- [Visualize in Weights & Biases](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/cw0lh48k) # mistral-7b-expo-7b-IPO-25-2 This model is a fine-tuned version of [hZzy/mistral-7b-sft-25-1](https://huggingface.co/hZzy/mistral-7b-sft-25-1) on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set: - Loss: 8.3755 - Objective: 8.4853 - Reward Accuracy: 0.6261 - Logp Accuracy: 0.5450 - Log Diff Policy: 2.3621 - Chosen Logps: -94.1866 - Rejected Logps: -96.5486 - Chosen Rewards: 0.0248 - Rejected Rewards: -0.0742 - Logits: -2.1694 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 3 - eval_batch_size: 3 - seed: 42 - distributed_type: multi-GPU - num_devices: 3 - gradient_accumulation_steps: 12 - total_train_batch_size: 108 - total_eval_batch_size: 9 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits | |:-------------:|:------:|:----:|:---------------:|:---------:|:---------------:|:-------------:|:---------------:|:------------:|:--------------:|:--------------:|:----------------:|:-------:| | 9.358 | 0.1517 | 100 | 9.4767 | 9.4748 | 0.5584 | 0.5282 | 0.9202 | -92.3263 | -93.2465 | 0.1178 | 0.0909 | -2.1941 | | 8.4865 | 0.3033 | 200 | 8.7651 | 8.8251 | 0.6121 | 0.5436 | 1.7419 | -92.0866 | -93.8284 | 0.1298 | 0.0618 | -2.2153 | | 8.2077 | 0.4550 | 300 | 8.5507 | 8.6214 | 0.6264 | 0.5461 | 2.2164 | -92.5313 | -94.7477 | 0.1076 | 0.0158 | -2.2241 | | 7.7431 | 0.6067 | 400 | 8.4722 | 8.5772 | 0.6314 | 0.5503 | 2.3954 | -90.6295 | -93.0249 | 0.2026 | 0.1020 | -2.2473 | | 7.822 | 0.7583 | 500 | 8.4631 | 8.5834 | 0.6284 | 0.5445 | 2.3357 | -89.8860 | -92.2217 | 0.2398 | 0.1421 | -2.2090 | | 7.6491 | 0.9100 | 600 | 8.4053 | 8.5433 | 0.6295 | 0.5445 | 2.2425 | -91.4902 | -93.7327 | 0.1596 | 0.0666 | -2.2060 | ### Framework versions - PEFT 0.11.1 - Transformers 4.42.0 - Pytorch 2.6.0+cu124 - Datasets 3.2.0 - Tokenizers 0.19.1