--- base_model: hZzy/mistral-7b-sft-25-1 datasets: - hZzy/direction_right2 library_name: peft license: apache-2.0 tags: - alignment-handbook - ndcg - trl - expo - generated_from_trainer model-index: - name: mistral-7b-expo-7b-L2EXPO-25-8 results: [] --- [Visualize in Weights & Biases](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/1j4l9dwx) # mistral-7b-expo-7b-L2EXPO-25-8 This model is a fine-tuned version of [hZzy/mistral-7b-sft-25-1](https://huggingface.co/hZzy/mistral-7b-sft-25-1) on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set: - Loss: 0.4420 - Objective: 0.4416 - Reward Accuracy: 0.6655 - Logp Accuracy: 0.6499 - Log Diff Policy: 15.7908 - Chosen Logps: -168.1699 - Rejected Logps: -183.9607 - Chosen Rewards: -0.7349 - Rejected Rewards: -0.8890 - Logits: -2.0174 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 3 - eval_batch_size: 3 - seed: 42 - distributed_type: multi-GPU - num_devices: 3 - gradient_accumulation_steps: 12 - total_train_batch_size: 108 - total_eval_batch_size: 9 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits | |:-------------:|:------:|:----:|:---------------:|:---------:|:---------------:|:-------------:|:---------------:|:------------:|:--------------:|:--------------:|:----------------:|:-------:| | 0.6077 | 0.1517 | 100 | 0.5074 | 0.5043 | 0.5702 | 0.5254 | 0.9522 | -93.2911 | -94.2433 | 0.0139 | 0.0082 | -2.2027 | | 0.5413 | 0.3033 | 200 | 0.4873 | 0.4899 | 0.6037 | 0.5937 | 8.2778 | -125.3062 | -133.5840 | -0.3062 | -0.3852 | -1.9290 | | 0.5306 | 0.4550 | 300 | 0.4690 | 0.4694 | 0.6373 | 0.6267 | 12.8023 | -128.2087 | -141.0110 | -0.3353 | -0.4595 | -1.9229 | | 0.4809 | 0.6067 | 400 | 0.4566 | 0.4556 | 0.6507 | 0.6390 | 13.4130 | -110.8082 | -124.2212 | -0.1613 | -0.2916 | -2.1010 | | 0.5005 | 0.7583 | 500 | 0.4521 | 0.4527 | 0.6566 | 0.6538 | 16.5186 | -141.8083 | -158.3269 | -0.4713 | -0.6326 | -2.0970 | | 0.4543 | 0.9100 | 600 | 0.4501 | 0.4510 | 0.6692 | 0.6586 | 17.7658 | -179.5024 | -197.2682 | -0.8482 | -1.0220 | -2.0605 | ### Framework versions - PEFT 0.11.1 - Transformers 4.42.0 - Pytorch 2.6.0+cu124 - Datasets 3.2.0 - Tokenizers 0.19.1