File size: 9,385 Bytes
e952b41
 
fc94426
 
e952b41
 
 
fc94426
e952b41
 
 
 
 
 
 
 
 
 
 
 
 
fc94426
e952b41
fc94426
 
 
 
 
 
 
 
 
e952b41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
base_model: mistralai/Mistral-7B-v0.1
datasets:
- HuggingFaceH4/ultrafeedback_binarized
library_name: peft
license: apache-2.0
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-qlora
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-qlora

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4757
- Rewards/chosen: -3.6825
- Rewards/rejected: -4.9601
- Rewards/accuracies: 0.7540
- Rewards/margins: 1.2776
- Logps/rejected: -740.5720
- Logps/chosen: -632.8636
- Logits/rejected: -1.1984
- Logits/chosen: -1.3150

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6809        | 0.0262 | 100  | 0.6807          | 0.0519         | 0.0257           | 0.6580             | 0.0262          | -241.9869      | -259.4206    | -2.0558         | -2.1488       |
| 0.6438        | 0.0523 | 200  | 0.6351          | -0.1905        | -0.3429          | 0.6800             | 0.1524          | -278.8497      | -283.6621    | -2.0145         | -2.1026       |
| 0.5829        | 0.0785 | 300  | 0.6072          | -0.4462        | -0.7133          | 0.6780             | 0.2671          | -315.8949      | -309.2386    | -2.0508         | -2.1342       |
| 0.6201        | 0.1047 | 400  | 0.5892          | -1.4907        | -1.9543          | 0.6845             | 0.4636          | -439.9887      | -413.6829    | -1.6374         | -1.7202       |
| 0.5798        | 0.1309 | 500  | 0.5667          | -1.3123        | -2.0041          | 0.7020             | 0.6918          | -444.9709      | -395.8432    | -1.2046         | -1.3376       |
| 0.5395        | 0.1570 | 600  | 0.5524          | -1.2157        | -1.8227          | 0.7030             | 0.6069          | -426.8258      | -386.1879    | -1.1445         | -1.2781       |
| 0.5278        | 0.1832 | 700  | 0.5336          | -3.1382        | -4.0509          | 0.7265             | 0.9127          | -649.6522      | -578.4380    | -0.6999         | -0.8394       |
| 0.4969        | 0.2094 | 800  | 0.5242          | -1.8373        | -2.6256          | 0.7245             | 0.7883          | -507.1189      | -448.3450    | -1.1250         | -1.2524       |
| 0.4794        | 0.2355 | 900  | 0.5246          | -2.0059        | -2.8266          | 0.7255             | 0.8207          | -527.2198      | -465.2022    | -0.8588         | -0.9944       |
| 0.5261        | 0.2617 | 1000 | 0.5109          | -2.8850        | -3.8029          | 0.7395             | 0.9179          | -624.8492      | -553.1188    | -0.6716         | -0.8193       |
| 0.6001        | 0.2879 | 1100 | 0.5050          | -2.4905        | -3.3317          | 0.7375             | 0.8412          | -577.7299      | -513.6636    | -0.6634         | -0.8245       |
| 0.5911        | 0.3141 | 1200 | 0.4983          | -2.2735        | -3.2228          | 0.7385             | 0.9493          | -566.8434      | -491.9688    | -0.9871         | -1.1192       |
| 0.5345        | 0.3402 | 1300 | 0.5001          | -3.5214        | -4.7330          | 0.7450             | 1.2115          | -717.8565      | -616.7566    | -0.8540         | -0.9911       |
| 0.5291        | 0.3664 | 1400 | 0.4987          | -2.7865        | -3.7479          | 0.7475             | 0.9614          | -619.3545      | -543.2670    | -1.0816         | -1.2062       |
| 0.4495        | 0.3926 | 1500 | 0.5144          | -2.4600        | -3.6484          | 0.7330             | 1.1884          | -609.4039      | -510.6184    | -1.1934         | -1.3216       |
| 0.5586        | 0.4187 | 1600 | 0.4937          | -2.4987        | -3.5027          | 0.7430             | 1.0040          | -594.8329      | -514.4847    | -1.1838         | -1.3066       |
| 0.4895        | 0.4449 | 1700 | 0.4948          | -3.6212        | -4.8051          | 0.7295             | 1.1839          | -725.0694      | -626.7305    | -0.9648         | -1.1064       |
| 0.485         | 0.4711 | 1800 | 0.4885          | -4.0215        | -5.2285          | 0.7525             | 1.2070          | -767.4141      | -666.7680    | -1.0276         | -1.1613       |
| 0.4387        | 0.4973 | 1900 | 0.4897          | -3.8136        | -5.0345          | 0.7460             | 1.2208          | -748.0074      | -645.9786    | -1.1075         | -1.2419       |
| 0.4613        | 0.5234 | 2000 | 0.4941          | -4.5643        | -5.6977          | 0.7410             | 1.1334          | -814.3307      | -721.0457    | -0.9859         | -1.1242       |
| 0.4939        | 0.5496 | 2100 | 0.4877          | -4.6441        | -5.8517          | 0.75               | 1.2077          | -829.7325      | -729.0210    | -1.1445         | -1.2699       |
| 0.4782        | 0.5758 | 2200 | 0.4813          | -3.2786        | -4.2916          | 0.7485             | 1.0130          | -673.7171      | -592.4716    | -1.2439         | -1.3665       |
| 0.4682        | 0.6019 | 2300 | 0.4885          | -4.1629        | -5.5525          | 0.7455             | 1.3897          | -799.8126      | -680.9020    | -1.0667         | -1.1952       |
| 0.4582        | 0.6281 | 2400 | 0.4859          | -3.7434        | -4.9841          | 0.7460             | 1.2407          | -742.9675      | -638.9534    | -1.0476         | -1.1735       |
| 0.4948        | 0.6543 | 2500 | 0.4817          | -3.6128        | -4.8362          | 0.7425             | 1.2234          | -728.1769      | -625.8918    | -1.0472         | -1.1781       |
| 0.4588        | 0.6805 | 2600 | 0.4854          | -3.5980        | -4.8557          | 0.7430             | 1.2577          | -730.1331      | -624.4171    | -1.1158         | -1.2400       |
| 0.5354        | 0.7066 | 2700 | 0.4857          | -4.1262        | -5.3649          | 0.7445             | 1.2387          | -781.0517      | -677.2343    | -1.0720         | -1.1950       |
| 0.4782        | 0.7328 | 2800 | 0.4822          | -3.8568        | -5.1115          | 0.7460             | 1.2547          | -755.7133      | -650.2979    | -1.1544         | -1.2733       |
| 0.5135        | 0.7590 | 2900 | 0.4807          | -3.9503        | -5.2306          | 0.7475             | 1.2804          | -767.6244      | -659.6406    | -1.1773         | -1.2961       |
| 0.4613        | 0.7851 | 3000 | 0.4783          | -3.6454        | -4.8177          | 0.7545             | 1.1723          | -726.3349      | -629.1588    | -1.1940         | -1.3123       |
| 0.4904        | 0.8113 | 3100 | 0.4787          | -3.8925        | -5.1623          | 0.7535             | 1.2698          | -760.7857      | -653.8602    | -1.1654         | -1.2847       |
| 0.4706        | 0.8375 | 3200 | 0.4755          | -3.4858        | -4.6973          | 0.7525             | 1.2116          | -714.2923      | -613.1915    | -1.2139         | -1.3301       |
| 0.519         | 0.8636 | 3300 | 0.4762          | -3.6863        | -4.9393          | 0.7525             | 1.2530          | -738.4901      | -633.2412    | -1.1986         | -1.3147       |
| 0.4446        | 0.8898 | 3400 | 0.4762          | -3.8220        | -5.1066          | 0.7535             | 1.2847          | -755.2252      | -646.8135    | -1.1676         | -1.2864       |
| 0.5378        | 0.9160 | 3500 | 0.4759          | -3.7562        | -5.0452          | 0.7530             | 1.2890          | -749.0795      | -640.2327    | -1.1933         | -1.3106       |
| 0.4506        | 0.9422 | 3600 | 0.4759          | -3.7087        | -4.9945          | 0.7535             | 1.2857          | -744.0071      | -635.4867    | -1.1944         | -1.3115       |
| 0.4732        | 0.9683 | 3700 | 0.4758          | -3.6903        | -4.9695          | 0.7540             | 1.2792          | -741.5083      | -633.6405    | -1.1938         | -1.3109       |
| 0.5041        | 0.9945 | 3800 | 0.4758          | -3.6841        | -4.9619          | 0.7545             | 1.2778          | -740.7547      | -633.0256    | -1.1922         | -1.3094       |


### Framework versions

- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1