Text Generation
Transformers
PyTorch
Safetensors
English
gpt2
alignment
instruction tuned
text generation
conversation
assistant
dpo
text-generation-inference
Inference Endpoints
Aira-2-124M-DPO / README.md
nicholasKluge's picture
Update README.md
f09d00d
|
raw
history blame
No virus
2.36 kB
metadata
license: apache-2.0
datasets:
  - nicholasKluge/reward-aira-dataset
language:
  - en
library_name: transformers
pipeline_tag: text-generation
tags:
  - text-generation-inference

Aira-2-124M-DPO-checkpoint-200

Hyperparameters

model_args:
  base_model: "nicholasKluge/Aira-2-124M"
  model_ref: "nicholasKluge/Aira-2-124M"
  cache_dir: null
data_args:
    dataset_name: "nicholasKluge/reward-aira-dataset"
    dataset_split: "english"
    validation_split_percentage: null
    streaming: false
    max_prompt_length: 150
    max_length: 600
    sanity_check: false
training_args:
  output_dir: "checkpoints"
  do_eval: false
  evaluation_strategy: "no"
  save_strategy: "steps"
  logging_strategy: "steps"
  logging_steps: 200
  max_steps: 2400
  save_steps: 200
  per_device_train_batch_size: 8
  per_device_eval_batch_size: 8
  gradient_accumulation_steps: 1
  gradient_checkpointing: false
  optim: "adamw_torch"
  learning_rate: 0.00005
  lr_scheduler_type: "cosine"
  warmup_steps: 100
  hub_token: null
  push_to_hub: false
  hub_model_id: null
extra_args:
  project_name: "Aira-2"
  wandb_token: null
  beta: 0.9

Logs

Key Value
loss 0.2274
learning_rate 4.976714865090827e-05
rewards/chosen -33.849693298339844
rewards/rejected -114.72045135498047
rewards/accuracies 0.9768750071525574
rewards/margins 80.87075805664062
logps/rejected -404.8834228515625
logps/chosen -383.7469482421875
logits/rejected -67.6454086303711
logits/chosen -30.543472290039062
epoch 0.05

Eval

Task Version Metric Value Stderr
arc_challenge 0 acc 0.2125 ± 0.0120
acc_norm 0.2466 ± 0.0126
toxigen 0 acc 0.5479 ± 0.0162
acc_norm 0.4309 ± 0.0162
truthfulqa_mc 1 mc1 0.2546 ± 0.0153
mc2 0.4261 ± 0.0154