Llama-3.1-8B-Instruct-SAA-700

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the bct_non_cot_dpo_700 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0846
  • Rewards/chosen: -0.0062
  • Rewards/rejected: -0.0635
  • Rewards/accuracies: 0.8857
  • Rewards/margins: 0.0573
  • Logps/rejected: -0.6353
  • Logps/chosen: -0.0623
  • Logits/rejected: -0.4422
  • Logits/chosen: -0.3590
  • Sft Loss: 0.0098
  • Odds Ratio Loss: 0.7473

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Sft Loss Odds Ratio Loss
1.3943 1.2698 50 1.1722 -0.1129 -0.1708 0.8714 0.0579 -1.7085 -1.1294 -0.5079 -0.3998 0.1369 10.3529
0.2668 2.5397 100 0.1508 -0.0126 -0.0716 0.8857 0.0590 -0.7158 -0.1261 -0.4935 -0.3919 0.0160 1.3479
0.1305 3.8095 150 0.0939 -0.0069 -0.0601 0.8857 0.0531 -0.6007 -0.0692 -0.4467 -0.3595 0.0109 0.8298
0.126 5.0794 200 0.0885 -0.0065 -0.0608 0.8857 0.0542 -0.6076 -0.0653 -0.4471 -0.3614 0.0103 0.7822
0.0881 6.3492 250 0.0876 -0.0064 -0.0617 0.8857 0.0553 -0.6175 -0.0642 -0.4433 -0.3588 0.0102 0.7739
0.1042 7.6190 300 0.0846 -0.0062 -0.0635 0.8857 0.0573 -0.6353 -0.0623 -0.4422 -0.3590 0.0098 0.7473
0.1405 8.8889 350 0.0853 -0.0063 -0.0644 0.8857 0.0581 -0.6435 -0.0627 -0.4405 -0.3572 0.0099 0.7540

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.2
  • Pytorch 2.3.0
  • Datasets 2.19.0
  • Tokenizers 0.20.0
Downloads last month
172
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-SAA-700

Adapter
(955)
this model