---
license: other
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: Qwen/Qwen1.5-7B-Chat
model-index:
- name: Qwen1.5-7B-Dutch-Chat-Dpo
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Qwen1.5-7B-Dutch-Chat-Dpo

This model is a fine-tuned version of [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.2610
- Rewards/chosen: -0.7248
- Rewards/rejected: -2.6224
- Rewards/accuracies: 0.9170
- Rewards/margins: 1.8976
- Logps/rejected: -877.8102
- Logps/chosen: -783.4282
- Logits/rejected: -0.8110
- Logits/chosen: -0.7528

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 32
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.5503        | 0.1   | 30   | 0.4684          | -0.0439        | -0.6295          | 0.8919             | 0.5856          | -837.9513      | -769.8103    | -0.9335         | -0.8894       |
| 0.4178        | 0.2   | 60   | 0.3568          | -0.3713        | -1.4769          | 0.9015             | 1.1056          | -854.9000      | -776.3594    | -0.8768         | -0.8276       |
| 0.3264        | 0.29  | 90   | 0.3143          | -0.4893        | -1.8730          | 0.9151             | 1.3837          | -862.8228      | -778.7191    | -0.8428         | -0.7929       |
| 0.2999        | 0.39  | 120  | 0.2885          | -0.6832        | -2.3118          | 0.9151             | 1.6286          | -871.5981      | -782.5971    | -0.8260         | -0.7730       |
| 0.3454        | 0.49  | 150  | 0.2749          | -0.7239        | -2.4904          | 0.9189             | 1.7664          | -875.1693      | -783.4113    | -0.8235         | -0.7678       |
| 0.3354        | 0.59  | 180  | 0.2685          | -0.6775        | -2.4859          | 0.9170             | 1.8084          | -875.0795      | -782.4824    | -0.8130         | -0.7574       |
| 0.2848        | 0.68  | 210  | 0.2652          | -0.7157        | -2.5692          | 0.9131             | 1.8535          | -876.7465      | -783.2466    | -0.8157         | -0.7586       |
| 0.3437        | 0.78  | 240  | 0.2621          | -0.7233        | -2.6091          | 0.9151             | 1.8857          | -877.5430      | -783.3994    | -0.8138         | -0.7561       |
| 0.2655        | 0.88  | 270  | 0.2611          | -0.7183        | -2.6154          | 0.9151             | 1.8971          | -877.6708      | -783.2995    | -0.8106         | -0.7524       |
| 0.3442        | 0.98  | 300  | 0.2610          | -0.7248        | -2.6224          | 0.9170             | 1.8976          | -877.8102      | -783.4282    | -0.8110         | -0.7528       |


### Framework versions

- PEFT 0.9.0
- Transformers 4.38.2
- Pytorch 2.2.1+cu121
- Datasets 2.17.1
- Tokenizers 0.15.2