Text Generation
Transformers
PyTorch
Safetensors
English
gpt2
alignment
instruction tuned
text generation
conversation
assistant
dpo
text-generation-inference
Inference Endpoints
Aira-2-124M-DPO / README.md
nicholasKluge's picture
Update README.md
6ce8837
|
raw
history blame
2.36 kB
---
license: apache-2.0
datasets:
- nicholasKluge/reward-aira-dataset
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation-inference
---
# Aira-2-124M-DPO-checkpoint-200
## Hyperparameters
```yaml
model_args:
base_model: "nicholasKluge/Aira-2-124M"
model_ref: "nicholasKluge/Aira-2-124M"
cache_dir: null
data_args:
dataset_name: "nicholasKluge/reward-aira-dataset"
dataset_split: "english"
validation_split_percentage: null
streaming: false
max_prompt_length: 150
max_length: 600
sanity_check: false
training_args:
output_dir: "checkpoints"
do_eval: false
evaluation_strategy: "no"
save_strategy: "steps"
logging_strategy: "steps"
logging_steps: 200
max_steps: 2400
save_steps: 200
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
gradient_accumulation_steps: 1
gradient_checkpointing: false
optim: "adamw_torch"
learning_rate: 0.00005
lr_scheduler_type: "cosine"
warmup_steps: 100
hub_token: null
push_to_hub: false
hub_model_id: null
extra_args:
project_name: "Aira-2"
wandb_token: null
beta: 0.8
```
## Logs
| Key | Value |
|-----------------------|---------------------------------|
| loss | 0.2274 |
| learning_rate | 4.976714865090827e-05 |
| rewards/chosen | -33.849693298339844 |
| rewards/rejected | -114.72045135498047 |
| rewards/accuracies | 0.9768750071525574 |
| rewards/margins | 80.87075805664062 |
| logps/rejected | -404.8834228515625 |
| logps/chosen | -383.7469482421875 |
| logits/rejected | -67.6454086303711 |
| logits/chosen | -30.543472290039062 |
| epoch | 0.05 |
## Eval
| Task |Version| Metric |Value | |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge| 0|acc |0.2031|± |0.0118|
| | |acc_norm|0.2491|± |0.0126|
|toxigen | 0|acc |0.5521|± |0.0162|
| | |acc_norm|0.4340|± |0.0162|
|truthfulqa_mc| 1|mc1 |0.2485|± |0.0151|
| | |mc2 |0.4368|± |0.0153|