Llama-3.1-8B-Instruct-SAA-700
This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the bct_non_cot_dpo_700 dataset. It achieves the following results on the evaluation set:
- Loss: 0.0846
- Rewards/chosen: -0.0062
- Rewards/rejected: -0.0635
- Rewards/accuracies: 0.8857
- Rewards/margins: 0.0573
- Logps/rejected: -0.6353
- Logps/chosen: -0.0623
- Logits/rejected: -0.4422
- Logits/chosen: -0.3590
- Sft Loss: 0.0098
- Odds Ratio Loss: 0.7473
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Sft Loss | Odds Ratio Loss |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1.3943 | 1.2698 | 50 | 1.1722 | -0.1129 | -0.1708 | 0.8714 | 0.0579 | -1.7085 | -1.1294 | -0.5079 | -0.3998 | 0.1369 | 10.3529 |
0.2668 | 2.5397 | 100 | 0.1508 | -0.0126 | -0.0716 | 0.8857 | 0.0590 | -0.7158 | -0.1261 | -0.4935 | -0.3919 | 0.0160 | 1.3479 |
0.1305 | 3.8095 | 150 | 0.0939 | -0.0069 | -0.0601 | 0.8857 | 0.0531 | -0.6007 | -0.0692 | -0.4467 | -0.3595 | 0.0109 | 0.8298 |
0.126 | 5.0794 | 200 | 0.0885 | -0.0065 | -0.0608 | 0.8857 | 0.0542 | -0.6076 | -0.0653 | -0.4471 | -0.3614 | 0.0103 | 0.7822 |
0.0881 | 6.3492 | 250 | 0.0876 | -0.0064 | -0.0617 | 0.8857 | 0.0553 | -0.6175 | -0.0642 | -0.4433 | -0.3588 | 0.0102 | 0.7739 |
0.1042 | 7.6190 | 300 | 0.0846 | -0.0062 | -0.0635 | 0.8857 | 0.0573 | -0.6353 | -0.0623 | -0.4422 | -0.3590 | 0.0098 | 0.7473 |
0.1405 | 8.8889 | 350 | 0.0853 | -0.0063 | -0.0644 | 0.8857 | 0.0581 | -0.6435 | -0.0627 | -0.4405 | -0.3572 | 0.0099 | 0.7540 |
Framework versions
- PEFT 0.12.0
- Transformers 4.45.2
- Pytorch 2.3.0
- Datasets 2.19.0
- Tokenizers 0.20.0
- Downloads last month
- 172
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for chchen/Llama-3.1-8B-Instruct-SAA-700
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct