metadata
license: apache-2.0
datasets:
- nicholasKluge/reward-aira-dataset
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation-inference
Aira-2-124M-DPO-checkpoint-200
Hyperparameters
model_args:
base_model: "nicholasKluge/Aira-2-124M"
model_ref: "nicholasKluge/Aira-2-124M"
cache_dir: null
data_args:
dataset_name: "nicholasKluge/reward-aira-dataset"
dataset_split: "english"
validation_split_percentage: null
streaming: false
max_prompt_length: 150
max_length: 600
sanity_check: false
training_args:
output_dir: "checkpoints"
do_eval: false
evaluation_strategy: "no"
save_strategy: "steps"
logging_strategy: "steps"
logging_steps: 200
max_steps: 2400
save_steps: 200
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
gradient_accumulation_steps: 1
gradient_checkpointing: false
optim: "adamw_torch"
learning_rate: 0.00005
lr_scheduler_type: "cosine"
warmup_steps: 100
hub_token: null
push_to_hub: false
hub_model_id: null
extra_args:
project_name: "Aira-2"
wandb_token: null
beta: 0.8
Logs
Key | Value |
---|---|
loss | 0.2274 |
learning_rate | 4.976714865090827e-05 |
rewards/chosen | -33.849693298339844 |
rewards/rejected | -114.72045135498047 |
rewards/accuracies | 0.9768750071525574 |
rewards/margins | 80.87075805664062 |
logps/rejected | -404.8834228515625 |
logps/chosen | -383.7469482421875 |
logits/rejected | -67.6454086303711 |
logits/chosen | -30.543472290039062 |
epoch | 0.05 |
Eval
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_challenge | 0 | acc | 0.2031 | ± | 0.0118 |
acc_norm | 0.2491 | ± | 0.0126 | ||
toxigen | 0 | acc | 0.5521 | ± | 0.0162 |
acc_norm | 0.4340 | ± | 0.0162 | ||
truthfulqa_mc | 1 | mc1 | 0.2485 | ± | 0.0151 |
mc2 | 0.4368 | ± | 0.0153 |