Llama-3.1-8B-Instruct_dpo_sg_values_p025_OA_silver
This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the dpo_sg_values_p025_OA_silver dataset. It achieves the following results on the evaluation set:
- Loss: 0.1344
- Rewards/chosen: -0.3139
- Rewards/rejected: -3.7636
- Rewards/accuracies: 0.9440
- Rewards/margins: 3.4497
- Logps/chosen: -5.2304
- Logps/rejected: -44.2580
- Logits/chosen: -0.7692
- Logits/rejected: -0.7949
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6882 | 0.1495 | 250 | 0.6470 | -0.0027 | -0.0991 | 0.8740 | 0.0963 | -2.1189 | -7.6131 | -0.6586 | -0.6706 |
0.4574 | 0.2990 | 500 | 0.3221 | -0.1440 | -1.3807 | 0.8900 | 1.2367 | -3.5317 | -20.4297 | -0.7456 | -0.7637 |
0.1975 | 0.4486 | 750 | 0.1916 | -0.3061 | -2.8027 | 0.9300 | 2.4966 | -5.1523 | -34.6491 | -0.8112 | -0.8331 |
0.128 | 0.5981 | 1000 | 0.1556 | -0.3224 | -3.2965 | 0.9360 | 2.9741 | -5.3155 | -39.5871 | -0.7987 | -0.8227 |
0.1217 | 0.7476 | 1250 | 0.1421 | -0.3065 | -3.5741 | 0.9380 | 3.2675 | -5.1568 | -42.3631 | -0.7826 | -0.8077 |
0.0864 | 0.8971 | 1500 | 0.1368 | -0.3144 | -3.7298 | 0.9440 | 3.4154 | -5.2351 | -43.9201 | -0.7712 | -0.7968 |
Framework versions
- PEFT 0.15.2
- Transformers 4.49.0
- Pytorch 2.6.0+cu124
- Datasets 2.21.0
- Tokenizers 0.21.1
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Incomple/Llama-3.1-8B-Instruct_dpo_sg_values_p025_OA_silver
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct