|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- nicholasKluge/reward-aira-dataset |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- text-generation-inference |
|
--- |
|
|
|
# Aira-2-124M-DPO-checkpoint-200 |
|
|
|
## Hyperparameters |
|
|
|
```yaml |
|
model_args: |
|
base_model: "nicholasKluge/Aira-2-124M" |
|
model_ref: "nicholasKluge/Aira-2-124M" |
|
cache_dir: null |
|
data_args: |
|
dataset_name: "nicholasKluge/reward-aira-dataset" |
|
dataset_split: "english" |
|
validation_split_percentage: null |
|
streaming: false |
|
max_prompt_length: 150 |
|
max_length: 600 |
|
sanity_check: false |
|
training_args: |
|
output_dir: "checkpoints" |
|
do_eval: false |
|
evaluation_strategy: "no" |
|
save_strategy: "steps" |
|
logging_strategy: "steps" |
|
logging_steps: 200 |
|
max_steps: 2400 |
|
save_steps: 200 |
|
per_device_train_batch_size: 8 |
|
per_device_eval_batch_size: 8 |
|
gradient_accumulation_steps: 1 |
|
gradient_checkpointing: false |
|
optim: "adamw_torch" |
|
learning_rate: 0.00005 |
|
lr_scheduler_type: "cosine" |
|
warmup_steps: 100 |
|
hub_token: null |
|
push_to_hub: false |
|
hub_model_id: null |
|
extra_args: |
|
project_name: "Aira-2" |
|
wandb_token: null |
|
beta: 0.8 |
|
``` |
|
|
|
## Logs |
|
|
|
| Key | Value | |
|
|-----------------------|---------------------------------| |
|
| loss | 0.2274 | |
|
| learning_rate | 4.976714865090827e-05 | |
|
| rewards/chosen | -33.849693298339844 | |
|
| rewards/rejected | -114.72045135498047 | |
|
| rewards/accuracies | 0.9768750071525574 | |
|
| rewards/margins | 80.87075805664062 | |
|
| logps/rejected | -404.8834228515625 | |
|
| logps/chosen | -383.7469482421875 | |
|
| logits/rejected | -67.6454086303711 | |
|
| logits/chosen | -30.543472290039062 | |
|
| epoch | 0.05 | |
|
|
|
|
|
## Eval |
|
|
|
| Task |Version| Metric |Value | |Stderr| |
|
|-------------|------:|--------|-----:|---|-----:| |
|
|arc_challenge| 0|acc |0.2031|± |0.0118| |
|
| | |acc_norm|0.2491|± |0.0126| |
|
|toxigen | 0|acc |0.5521|± |0.0162| |
|
| | |acc_norm|0.4340|± |0.0162| |
|
|truthfulqa_mc| 1|mc1 |0.2485|± |0.0151| |
|
| | |mc2 |0.4368|± |0.0153| |